Skip to content

Use libnftables in dynamically linked binary#51033

Merged
robmry merged 2 commits intomoby:masterfrom
robmry:use-libnftables
Oct 3, 2025
Merged

Use libnftables in dynamically linked binary#51033
robmry merged 2 commits intomoby:masterfrom
robmry:use-libnftables

Conversation

@robmry
Copy link
Contributor

@robmry robmry commented Sep 24, 2025

- What I did

When dockerd is dynamically linked, use cgo to call functions in libnftables instead of exec-ing the nft binary.

On my M2 macbook, using the library knocks about 20ms off each update, down to less than 1ms. Initialisation and teardown operations happen sequentially, so the time saving comes directly off the total.

Using libnftables brings nftables update performance in line with iptables - typical times ...

iptables nft libnftables
docker network create --ipv6 b46 48ms 57ms 10ms
docker run --rm -ti --network b46 -p 8080:80 busybox (start) 95ms 185ms 109ms
Click for OTEL traces ...

Network create

iptables

network-create-iptables

nft binary

network-create-execnft

libnftables

network-create-cgonft

Container start

iptables

container-start-iptables

nft binary

container-start-execnft

libnftables

Screenshot 2025-09-24 at 18 15 54

- How I did it

Use cgo to call libnftables functions.

In packaging scripts/config, we'll need to add a build dependency on libnftables-dev and a installation dependency on libnftables.

- How to verify it

Unit tests use the library, integration tests (against the statically linked binary) use the nft tool.

- Human readable description for the release notes

- When dynamically linked, the Docker daemon now depends on libnftables.

@robmry robmry added this to the 29.0.0 milestone Sep 24, 2025
@robmry robmry self-assigned this Sep 24, 2025
@robmry robmry added kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/networking Networking impact/changelog area/networking/firewalling Networking labels Sep 24, 2025
@robmry robmry force-pushed the use-libnftables branch 6 times, most recently from 3626da4 to 5b37749 Compare September 24, 2025 14:16
@robmry robmry marked this pull request as ready for review September 24, 2025 16:13
Comment on lines +80 to +82
_ = nft.table4.Close()
_ = nft.table6.Close()
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ = nft.table4.Close()
_ = nft.table6.Close()
return nil
return errors.Join(nft.table4.Close(), nft.table6.Close())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

MustFlush bool

applyLock sync.Mutex
nftHandle any // applyLock must be held to access
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea on how to make this field concretely typed.

Suggested change
nftHandle any // applyLock must be held to access
nftHandle nftHandle // applyLock must be held to access
// nft_cgo_linux.go
type nftHandle = *C.struct_nft_ctx
// nft_exec_linux.go
type nftHandle = struct{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thank you! Done.

Signed-off-by: Rob Murray <rob.murray@docker.com>
Copy link
Member

@akerouanton akerouanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but wondering if we could do without table.applyLock.

defer span.End()

if t.nftHandle == nil {
handle, err := newNftHandle()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How costly is it to instantiate a new struct nft_ctx and allocate new out/err buffers? I'm wondering if we could avoid table.applyLock if each call to nftApply is using its own struct nft_ctx as this lock serializes operations on independent networks / sandboxes.

Copy link
Contributor Author

@robmry robmry Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The applyLock isn't new ... it's there to protect the table while updates from a Modifier are applied to it, while an nftables command buffer is generated from the table, and during nftApply (in case the update fails and changes to the table need to be rolled back).

The Daemon has a single table in the host's netns (plus tables in each container netns for DNS) with rules for all networks/endpoints. The table in the host netns has nftables base chains that do a single vmap lookup to decide whether the packet needs further processing. If they do, packets are processed by short chains dealing with a specific network. The alternative would be a table per-network, then it'd be possible to make updates for different networks in parallel. But then, instead of a single vmap lookup, each packet (including non-Docker packets) would need to be matched against base chain rules in each of the per-network tables.

@robmry
Copy link
Contributor Author

robmry commented Sep 25, 2025

I'm still looking at the packaging changes needed for this ... please don't merge it yet.

With this tag, a dynamically linked binary will exec
the nft tool instead of using cgo to call libnftables
directly.

Signed-off-by: Rob Murray <rob.murray@docker.com>
@robmry
Copy link
Contributor Author

robmry commented Sep 26, 2025

I'm still looking at the packaging changes needed for this ... please don't merge it yet.

I've added a commit that makes it possible to build a dynamically linked dockerd that still execs "nft" - we'll use that for RHEL builds for now (experimental nftables release). RHEL requires a subscription to get hold of the "nftables-devel" package, and we've been waiting on an arm64 license.

Packaging PR - docker/docker-ce-packaging#1256

@robmry robmry merged commit b26972f into moby:master Oct 3, 2025
342 of 348 checks passed
@robmry robmry mentioned this pull request Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking/firewalling Networking area/networking Networking impact/changelog kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants