Use libnftables in dynamically linked binary#51033
Conversation
3626da4 to
5b37749
Compare
| _ = nft.table4.Close() | ||
| _ = nft.table6.Close() | ||
| return nil |
There was a problem hiding this comment.
| _ = nft.table4.Close() | |
| _ = nft.table6.Close() | |
| return nil | |
| return errors.Join(nft.table4.Close(), nft.table6.Close()) |
| MustFlush bool | ||
|
|
||
| applyLock sync.Mutex | ||
| nftHandle any // applyLock must be held to access |
There was a problem hiding this comment.
I have an idea on how to make this field concretely typed.
| nftHandle any // applyLock must be held to access | |
| nftHandle nftHandle // applyLock must be held to access |
// nft_cgo_linux.go
type nftHandle = *C.struct_nft_ctx// nft_exec_linux.go
type nftHandle = struct{}There was a problem hiding this comment.
Good idea, thank you! Done.
Signed-off-by: Rob Murray <rob.murray@docker.com>
5b37749 to
6db6de2
Compare
akerouanton
left a comment
There was a problem hiding this comment.
LGTM, but wondering if we could do without table.applyLock.
| defer span.End() | ||
|
|
||
| if t.nftHandle == nil { | ||
| handle, err := newNftHandle() |
There was a problem hiding this comment.
How costly is it to instantiate a new struct nft_ctx and allocate new out/err buffers? I'm wondering if we could avoid table.applyLock if each call to nftApply is using its own struct nft_ctx as this lock serializes operations on independent networks / sandboxes.
There was a problem hiding this comment.
The applyLock isn't new ... it's there to protect the table while updates from a Modifier are applied to it, while an nftables command buffer is generated from the table, and during nftApply (in case the update fails and changes to the table need to be rolled back).
The Daemon has a single table in the host's netns (plus tables in each container netns for DNS) with rules for all networks/endpoints. The table in the host netns has nftables base chains that do a single vmap lookup to decide whether the packet needs further processing. If they do, packets are processed by short chains dealing with a specific network. The alternative would be a table per-network, then it'd be possible to make updates for different networks in parallel. But then, instead of a single vmap lookup, each packet (including non-Docker packets) would need to be matched against base chain rules in each of the per-network tables.
|
I'm still looking at the packaging changes needed for this ... please don't merge it yet. |
With this tag, a dynamically linked binary will exec the nft tool instead of using cgo to call libnftables directly. Signed-off-by: Rob Murray <rob.murray@docker.com>
I've added a commit that makes it possible to build a dynamically linked dockerd that still execs "nft" - we'll use that for RHEL builds for now (experimental nftables release). RHEL requires a subscription to get hold of the "nftables-devel" package, and we've been waiting on an arm64 license. Packaging PR - docker/docker-ce-packaging#1256 |
- What I did
When dockerd is dynamically linked, use cgo to call functions in libnftables instead of exec-ing the nft binary.
On my M2 macbook, using the library knocks about 20ms off each update, down to less than 1ms. Initialisation and teardown operations happen sequentially, so the time saving comes directly off the total.
Using libnftables brings nftables update performance in line with iptables - typical times ...
Click for OTEL traces ...
Network create
iptables
nft binary
libnftables
Container start
iptables
nft binary
libnftables
- How I did it
Use cgo to call libnftables functions.
In packaging scripts/config, we'll need to add a build dependency on
libnftables-devand a installation dependency onlibnftables.- How to verify it
Unit tests use the library, integration tests (against the statically linked binary) use the nft tool.
- Human readable description for the release notes
- When dynamically linked, the Docker daemon now depends on libnftables.