Skip to content

nftables: preserve filter-FORWARD drop policy on restart/upgrade#50576

Closed
robmry wants to merge 2 commits intomoby:masterfrom
robmry:nftables_filter_forward_drop
Closed

nftables: preserve filter-FORWARD drop policy on restart/upgrade#50576
robmry wants to merge 2 commits intomoby:masterfrom
robmry:nftables_filter_forward_drop

Conversation

@robmry
Copy link
Contributor

@robmry robmry commented Jul 30, 2025

- What I did

See the linked issue for a description of the problems.

One option would be to always set the filter-FORWARD policy to "drop" for nftables - treating the introduction of nftables support as an opportunity to make a significant change. But, it's been tried before and reverted due to fallout including this issue in the Debian tracker. Before #48594, when ip6tables was enabled, the policy for IPv6 was always set to DROP - and that caused problems when we enabled ip6tables by default. So, that PR aligned the IPv6 behaviour with the long standing IPv4 behaviour. (However, #48594 also added daemon option --ip-forward-no-drop, which could now be used to deal with those problems. So, a new install of Docker would potentially break things on the host but, after some debugging, that problem could be fixed by manually setting policy back to accept, adding daemon option --ip-forward-no-drop, and restarting it. (But, I think Docker might be in the bin before then in many cases!)

Alternatively, go the other way - there's certainly a case to be made that Docker shouldn't enable IP forwarding on the host, or change the firewall policy to drop, unless it's told to. So, the daemon could error out if configured with ip forwarding enabled (the default), but forwarding isn't enabled in the kernel. It could then offer separate options for the user to tell it to enable forwarding, and set the firewall's forwarding policy to "drop". But, that's quite a significant change for existing users, on a host where the daemon had previously been enabling forwarding, it would appear to be fine after upgrade, then just fail to run after a reboot. So, this needs more consideration, but it's probably where we should be heading.

For the time being at least - try to preserve the existing behaviour ... only set the firewall policy to drop after enabling IP forwarding, and don't set the policy if forwarding was already enabled. But, when setting the policy to drop, remember that for the next daemon restart - it'll then be set to drop on each subsequent daemon restart (including after a reboot, even if IP forwarding was enabled by something else before dockerd started that time). That deals with the nftables restart case, and makes it less susceptible to changes in startup ordering... if Docker's messing with the policy and that's unwanted, it'll always do that so problems should be easier to track down. To stop it from setting the policy to drop on subsequent restarts, option --ip-forward-no-drop will be needed.

- How I did it

First commit ...

Set filter-FORWARD drop on restart after enabling IP forwarding

The daemon sets the host's filter-FORWARD firewall chain policy to DROP if it enables IP forwarding.

With iptables, that policy is preserved over a daemon restart (the chain is not deleted, it's a built-in chain). But, with nftables, the daemon's table is re-created on restart, so the filter-FORWARD drop policy is lost.

So, when setting the policy to "drop" after enabling IP forwarding, record that in the datastore. Every subsequent daemon restart will then set the policy to drop (for iptables or nftables).

To stop the daemon from setting the policy, it must be started with option '--ip-forward-no-drop'.

Second commit ...

Migrate iptables filter-FORWARD policy on upgrade

When a pre-29.0 daemon has been running, and it set the iptables filter-FORWARD policy to DROP because it enabled IP forwarding, if the daemon is stopped, upgraded to >=29.0 and nftables is enabled...

The iptables policy will still be DROP. The newly started daemon won't need to enable IP forwarding, and there won't be a record in the datastore indicating that it needs to be set.

So, when migrating from iptables to nftables, check the iptables filter-FORWARD policy. If it's DROP, set the nftables policy to drop, and note that in the datastore for future restarts. Then, set the iptables policy to ACCEPT, so that it doesn't drop packets for published ports (because the allow rules for them will only be in the nftables chain).

- How to verify it

Manual test:

  • Run a pre-29.0 daemon with IPv6 enabled on the default bridge, and IPv4 and IPv6 forwarding disabled in the kernel. Make sure the filter-FORWARD chains' policies are both set to DROP (iptables -L FORWARD, ip6tables -L FORWARD).
  • Stop that daemon, start a 29.0 daemon, check that the iptables filter-FORWARD policy is still DROP.
  • Restart that daemon, with firewall-backend nftables. Check that the iptables DROP policies have been removed, and the nftables policy for both filter-FORWARD chains is "drop" (nft list ruleset).

Updated unit and integration tests.

- Human readable description for the release notes

- If Docker enables IP forwarding on the host, it sets the default forwarding policy to "drop" in the firewall. Now, in addition it will remember that it's set the policy and do so on each subsequent restart (including following reboots where it did not enable IP forwarding). To prevent the daemon from setting the firewall's forwarding policy, use option `--ip-forward-no-drop`.

The daemon sets the host's filter-FORWARD firewall chain policy to
DROP if it enables IP forwarding.

With iptables, that policy is preserved over a daemon restart (the
chain is not deleted, it's a built-in chain). But, with nftables,
the daemon's table is re-created on restart, so the filter-FORWARD
drop policy is lost.

So, when setting the policy to "drop" after enabling IP forwarding,
record that in the datastore. Every subsequent daemon restart will
then set the policy to drop (for iptables or nftables).

To stop the daemon from setting the policy, it must be started with
option '--ip-forward-no-drop'.

Signed-off-by: Rob Murray <rob.murray@docker.com>
@robmry robmry self-assigned this Jul 30, 2025
@robmry robmry added this to the 29.0.0 milestone Jul 30, 2025
@robmry robmry force-pushed the nftables_filter_forward_drop branch from 086e527 to fa2d5c4 Compare July 30, 2025 19:46
@robmry robmry marked this pull request as ready for review July 30, 2025 19:47
@robmry robmry requested review from akerouanton and corhere July 30, 2025 19:47
@robmry robmry force-pushed the nftables_filter_forward_drop branch from fa2d5c4 to b5683d1 Compare July 31, 2025 08:14
// If there are no old rules to clean up, return nil.
func NewCleaner(ctx context.Context, config firewaller.Config) firewaller.FirewallCleaner {
clean := func(ipv iptables.IPVersion, enabled bool) bool {
clean := func(ipv iptables.IPVersion, enabled bool) (bool, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(bool, bool) 🙈

This could use a helper struct to describe each bool.

(or at least use named returns)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've named the returns.

When a pre-29.0 daemon has been running, and it set the iptables
filter-FORWARD policy to DROP because it enabled IP forwarding, if
the daemon is stopped, upgraded to >=29.0 and nftables is enabled...

The iptables policy will still be DROP. The newly started daemon
won't need to enable IP forwarding, and there won't be a record in
the datastore indicating that it needs to be set.

So, when migrating from iptables to nftables, check the iptables
filter-FORWARD policy. If it's DROP, set the nftables policy to
drop, and note that in the datastore for future restarts. Then,
set the iptables policy to ACCEPT, so that it doesn't drop packets
for published ports (because the allow rules for them will only
be in the nftables chain).

Signed-off-by: Rob Murray <rob.murray@docker.com>
@robmry robmry force-pushed the nftables_filter_forward_drop branch from b5683d1 to 44b22f5 Compare July 31, 2025 10:33
Copy link
Member

@akerouanton akerouanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained to you yesterday, I'm uneasy with this change, and with the whole idea of setting a default policy ourselves. The heuristic we use proved to be brittle in the past, this is getting better with this PR, but I think there are still edge-cases, some being probably hard to solve.

If we fail to set the default policy in some legitimate cases, this could be considered a security issue on our side, and worth a CVE. But if we end up in this situation, that would be caused by us drawing a target on our back… I don't like that.

treating the introduction of nftables support as an opportunity to make a significant change.

Do you think we'll have a better opportunity? From my standpoint, nftables support being introduced, and experimental, this is the best opportunity we've to try alternatives and gather feedback through GH discussion (or by any other means).

If we decide to test something else in the next release, wouldn't it be surprising for nftables users that we change how the default policy is set? Is this likely to delay nftables GA? IMO it'd be less surprising to do it right now.

But, it's been moby/libnetwork#2450 and moby/libnetwork#2460 due to fallout including this issue in the Debian tracker.

As you mentioned, that revert doesn't mention whether an equivalent of --ip-forward-no-drop was considered, but now we've it. So maybe it's not as bad as before to set the default DROP policy unconditionally.

If that default policy breaks something, it's either because another virtualization / containerization software is installed, or because the machine is multi-homed and routing across interfaces stops working. Depending on the install order, the system state at first startup, etc… that breakage could happen once the machine is rebooted, so maybe weeks, months, or even years after the Engine was installed, making the whole thing harder to root cause.

As I tried to explain yesterday, if we're in one of these scenarios, then there's a high probability that there's a sysadmin in the chair. Let them decide what they want to do instead of trying to be smart and decide in place of them through a dubious (and brittle) heuristic.

Deciding what the default policy is (or even if one is needed) takes sysadmins to consider the host's interfaces, containerization / virt software installed, which flows coming from VMs and destined to containers should be allowed or denied, etc. These aren't under Engine's control, so we can't take these under consideration.

By setting the default DROP policy unconditionally, we're at least very consistent, and break everything on first start. But, these excerpts from the Debian issue you linked to are a good reason to actually not set any default policy at all:

HUH??? Since when is it okay for software that is not advertised as firewall software to modify the sysadmin's implicit or explicit iptables setup for IP packets completely unrelated to said software?

Docker is hostile to other applications

So…

Alternatively, go the other way - there's certainly a case to be made that Docker shouldn't enable IP forwarding on the host, or change the firewall policy to drop

I think enabling IP forwarding automatically isn't really the issue here — it's required by the bridge driver to operate.

However, we don't depend on the default policy anymore, so the only reason we still set it automatically is to ensure that we don't create security issues on users' system. For the iptables backend, it'd be pretty bad to change that behavior now because some users may currently depend on us setting it (knowingly or not), so we'd be creating security holes on upgrade.

We could consider that it's not our job to set the default policy, and if users are facing one of the risky scenarios I listed above, they must be knowledgeable enough to exercise due diligence, and carefully choose the nft rules they want for the rest of their system. FWIW that's what libvirt, lxc, podman / CNI, etc. do (at least, AFAIK): enable forwarding, but don't set any default policy.

If we want to experiment with this approach, or the other one (unconditionally set the default DROP policy), we may still want to store whether the Engine enabled forwarding such that we can switch to your approach in a later version.

if ipv == firewaller.IPv4 {
return ic.filterForwardDrop4
}
return ic.filterForwardDrop4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ic.filterForwardDrop4
return ic.filterForwardDrop6

Comment on lines 43 to +45
if !t.Exists("filter", "FORWARD", "-j", DockerForwardChain) &&
!t.Exists("filter", "FORWARD", "-j", isolationChain1) {
return false
return false, false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I upgrade the Engine and switch to nftables backend, but between Engine stop and re-start, decide to flush my chains for whatever reason (maybe I want to switch something else from iptables-nft to nft), the Engine is going to restart in nftables mode and won't set the default DROP policy.

Is this desirable?

if err := nftApply(ctx, table); err != nil {
return err
}
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change's not needed.

ipvKey = ipForwarding6
}
if err := d.storeUpdate(ctx, &filterForwardDrop{ipv: ipvKey}); err != nil && !errors.Is(err, datastore.ErrKeyModified) {
log.G(ctx).WithError(err).Error("Cannot persist IP forwarding enabled flag")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say the partition storing /var/lib/docker is full, the Engine starts and sets ip_forward=1. Then this call to storeUpdate fails with ENOSPC, user figures out what's going on and fixes the issue, and restarts their Engine (without rebooting). They will end up with no default policy.

Another scenario, the Engine crashes after it sets ip_forward=1, but before this operation completes. This will end up with default policy too.

Is this desirable ?

return err
}

return d.initStore()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I upgrade to v29.0, stays on the iptables fwbackend at first, but manually set the default policy to ACCEPT and then decide to switch to nftables, migrateFilterForwardDrop will not touch the default policy, but initStore() will re-set DROP.

Is this acceptable?

// the daemon is first started. On restarts with nftables, the policy is lost when
// the nftables table is reconstructed. So, it needs to be re-set here.)
func (d *driver) initForwardingPolicy(ctx context.Context) error {
ffds, err := d.store.List(&filterForwardDrop{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users deleting /var/lib/docker, or /var/lib/docker/network, or local-kv.db is probably rare but not unhearded of (see this issue for instance). If this happens, ip_forward will be enabled, and there won't be any persisted state, so this code path will consider that the default policy should be ACCEPT.

Is that acceptable?

@akerouanton
Copy link
Member

To be clear, if other maintainers think it's okay to take this route, then fine - let's do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking/d/bridge Networking area/networking/firewalling Networking area/networking Networking impact/changelog kind/bugfix PR's that fix bugs release-blocker PRs we want to block a release on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nftables: IP forwarding, and filter-FORWARD policy

4 participants