Skip to content

nftables: never enable IP forwarding on the host#50646

Merged
robmry merged 6 commits intomoby:masterfrom
robmry:nftables_no_enable_ip_forwarding
Aug 11, 2025
Merged

nftables: never enable IP forwarding on the host#50646
robmry merged 6 commits intomoby:masterfrom
robmry:nftables_no_enable_ip_forwarding

Conversation

@robmry
Copy link
Contributor

@robmry robmry commented Aug 6, 2025

- What I did

When Docker is using iptables and it enables IP forwarding, unless daemon option --ip-forward-no-drop is true, it sets the iptables filter-FORWARD policy to DROP.

When running with nftables - don't do that, raise an error if forwarding is needed but not enabled, unless --ip-forward=false.

On a host without IPv4 forwarding enabled, nftables firewall backend, and other settings at defaults - the daemon will fail to start when it tries to create the default bridge. It will fail to restore other networks, but those errors are only logged, not sure why. In other cases, for example --bridge=none or creating an IPv6 network when IPv6 global forwarding is disabled, network creation will fail with an error.

Log a warning when migrating from iptables to nftables when the iptables policy is DROP, because it'll drop packets that have been accepted by nftables rules (so, for example, published ports won't work).

In 29.0, nftables is opt-in, and will be documented as "experimental". So, if we get push-back on this or come up with better ideas, we'll be able to change it.

Docs will need an update to say it's the user's responsibility to update their firewall rules to block unwanted forwarding between non-Docker network interfaces.

- How I did it

  • The first two commits are tidy-up - only nftabler needs an iptables cleaner, and only the iptabler (now) needs a method to set the filter-FORWARD policy. So, use those types directly in the bridge driver when needed.
  • Raise the error when using nftables and forwarding is not enabled, and update tests.
  • Log a warning about the iptables policy on migration to nftables.
  • Enable IP forwarding in rootlesskit's netns, before starting the daemon.
  • Update check-config.sh to report on the IP forwarding sysctls.

- How to verify it

  • Start Docker with IP forwarding disabled and --firewall-backend=iptables - check the filter-FORWARD policies are DROP.
  • Start Docker again with --firewall-backend=nftables, check logs for a warning about the iptables policy, for example ...
WARN[2025-08-06T13:26:03.383667797Z] Network traffic for published ports may be dropped, iptables chain FORWARD has policy DROP.  ipv=ipv4
  • Disable IP forwarding, restart with nftables, check the daemon fails with an error ...
failed to start daemon: Error initializing network controller: error creating default "bridge" network: IPv4 forwarding is disabled: check your host's firewalling and set sysctl net.ipv4.ip_forward=1, or disable this check using daemon option --ip-forward=false
  • With rootless Docker, checked a dual-stack bridge network was created ok with nftables (and wasn't without the sysctls in this PR), and that port publishing worked.

And, updated tests.

- Human readable description for the release notes

- nftables: Docker will not enable IP forwarding on the host. If forwarding is needed by a bridge network, but not enabled, daemon startup or network creation will fail with an error. You must either enable forwarding and ensure firewall rules are in place to prevent unwanted forwarding between non-Docker interfaces. Or, use daemon option `--ip-forward=false` to disable the check, but some bridge network functionality including port forwarding may not work. See [Engine Docs](https://docs.docker.com/engine/network/firewall-nftables) for more information about migration from iptables to nftables.

@robmry robmry added this to the 29.0.0 milestone Aug 6, 2025
@robmry robmry self-assigned this Aug 6, 2025
@robmry robmry added kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny area/networking Networking impact/changelog docs/revisit area/networking/firewalling Networking area/networking/d/bridge Networking release-blocker PRs we want to block a release on labels Aug 6, 2025
@robmry robmry force-pushed the nftables_no_enable_ip_forwarding branch 2 times, most recently from efcbb53 to bff3648 Compare August 6, 2025 15:28
@robmry robmry marked this pull request as ready for review August 6, 2025 16:05
@robmry robmry requested a review from tianon as a code owner August 6, 2025 16:05
@robmry robmry requested review from akerouanton and corhere August 6, 2025 16:05
@robmry robmry force-pushed the nftables_no_enable_ip_forwarding branch from 4b920c3 to ed345d2 Compare August 6, 2025 16:16
@robmry
Copy link
Contributor Author

robmry commented Aug 6, 2025

@AkihiroSuda - one of the commits here enables the IP forwarding sysctls in rootlesskit's netns ...

With --firewall-backend=nftables (which will eventually become the default), this PR makes it the user's job to configure their kernel as an installation step.

The dockerd-rootless change here enables forwarding whether using iptables or nftables. I don't think there's any harm in that, there aren't any non-Docker interfaces to forward between (apart from the netns's main tap0 interface, which needs forwarding enabled for port publishing to work anyway). It means Docker with iptables won't set a default DROP policy on the FORWARD chain, but I think that's also ok as there's no non-Docker traffic to forward.

Not sure whether I've put the sysctl commands in the best place?

Copy link
Member

@akerouanton akerouanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, otherwise LGTM. Thanks!

@robmry robmry force-pushed the nftables_no_enable_ip_forwarding branch from ed345d2 to 53853b4 Compare August 7, 2025 09:02
Copy link
Contributor

@austinvazquez austinvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM on green CI run. At a glance, the current failures are all related to known flaky test or unknown CI issue.

@austinvazquez
Copy link
Contributor

Needs rebase for CI fixes, then g2g. 👍🏼

@robmry robmry force-pushed the nftables_no_enable_ip_forwarding branch from 53853b4 to 5b7dc6d Compare August 8, 2025 10:50
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks sane to me; left some ramblings (as usual). Not merging as I wasn't sure if you wanted additional review from Cory

LGTM

// to iptables built-in chains. So, if cleanup is needed, give the cleaner to
// the nftabler. Then, it'll use it to delete old rules as networks are restored.
fw.(firewaller.FirewallCleanerSetter).SetFirewallCleaner(iptabler.NewCleaner(ctx, config))
fw.SetFirewallCleaner(iptabler.NewCleaner(ctx, config))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, just that my curiosity led me there; this looks to be the only place where we call fw.SetFirewallCleaner to a non nil value, and the constructor is above.

Which made me curious if this should be done as part of the constructor, possibly with a DisableCleaner(), DestroyCleaner, StopCleaning() (or whatever) on the Nftabler type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as I get a chance - I'll keep the cleaner in the bridge driver so the firewaller implementation(s) don't need to call it (review comment from @akerouanton) ... it's a bit simpler this way, because the current firewaller has all the context for deleting old rules, the bridge driver will need to fish it out somehow. But, Albin's right, if we ever add a new firewaller it shouldn't really need to call the cleaner itself.

Comment on lines +92 to +94
line, err := os.ReadFile(path)
if err != nil {
return false, err
return false, fmt.Errorf("reading '%s': %w", path, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious; was information missing in the error, or did you pre-emptively add this? Because os.ReadFile (and other file operations from stdlib) should usually return a fs.PathError / os.PathError, which already decorates the error with "Op" (e.g. open ), Path, and the error; https://github.com/golang/go/blob/768c51e368e05739854c1413fdf7fd129d01e482/src/io/fs/fs.go#L250-L255

// PathError records an error and the operation and file path that caused it.
type PathError struct {
    Op   string
    Path string
    Err  error
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, probably preemptive - I'll get rid of it.

f.ffd = ipv
func (f *ffDropper) FilterForwardDrop(_ context.Context, ipv firewaller.IPVersion) error {
switch ipv {
case firewaller.IPv4:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! The other IPVersion type 😂

Not for this PR, but it got me curious; is this always either/or, or could be "both"? Considering that the other type has that as an option (so if it could set both values at once);

type IPFamily int
const (
IP IPFamily = syscall.AF_UNSPEC // Either IPv4 or IPv6.
IPv4 IPFamily = syscall.AF_INET // Internet Protocol version 4 (IPv4).
IPv6 IPFamily = syscall.AF_INET6 // Internet Protocol version 6 (IPv6).
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the firewaller, it's always either/or ... with iptables/ip6tables the rules are always separate. I decided to keep it that way in nftables too (although it has an inet type as well as ip/ip6, so one table can deal with both families).

But, for us, there's not much advantage in trying to combine them. The rules for bridge networks can be configured quite differently for v4/v6 ... for example, NAT for v4 and direct routing for v6.

robmry added 6 commits August 8, 2025 18:43
Signed-off-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
For nftables only, never enable IP forwarding on the host. Instead,
return an error on network creation if forwarding is not enabled,
required by a bridge network, and --ip-forward=true.

If IPv4 forwarding is not enabled when the daemon is started with
nftables enabled and other config at defaults, the daemon will
exit when it tries to create the default bridge.

Otherwise, network creation will fail with an error if IPv4/IPv6
forwarding is not enabled when a network is created with IPv4/IPv6.

It's the user's responsibility to configure and secure their host
when they run Docker with nftables.

Signed-off-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
@robmry robmry force-pushed the nftables_no_enable_ip_forwarding branch from 5b7dc6d to 2fff6b4 Compare August 8, 2025 17:43
@robmry robmry merged commit 2d0bc97 into moby:master Aug 11, 2025
175 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking/d/bridge Networking area/networking/firewalling Networking area/networking Networking docs/revisit impact/changelog kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny release-blocker PRs we want to block a release on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nftables: IP forwarding, and filter-FORWARD policy

4 participants