Skip to content

libnetwork/iptables: refactor firewalld code#46237

Open
thaJeztah wants to merge 13 commits intomoby:masterfrom
thaJeztah:improve_firewalldenabled
Open

libnetwork/iptables: refactor firewalld code#46237
thaJeztah wants to merge 13 commits intomoby:masterfrom
thaJeztah:improve_firewalldenabled

Conversation

@thaJeztah
Copy link
Member

@thaJeztah thaJeztah commented Aug 15, 2023

libnetwork/iptables: un-export Conn

Un-export Conn, as it's only used internally. Rename it to firewalldConnection,
to make clearer what it's used for, and rename the sysconn field.

libnetwork/iptables: firewalldInit: move error message to newConnection

Move generating the error-message to where we know what the cause is,
instead of assuming we're unable to make a D-Bus connection.

libnetwork/iptables: firewalldInit: move error message to newConnection

Move generating the error-message to where we know what the cause is,
instead of assuming we're unable to make a D-Bus connection.

libnetwork/iptables: make newConnection() slightly more idiomatic

Don't consider a connection successful if firewalld is not running. If
it's not running, return a suitable error message. We only use the D-Bus
connection to communicate with firewalld, so no need to return a connection
if it's not running.

libnetwork/iptables: make signalHandler a method

Make it a method on the firewalldConnection, which felt more natural
than being implemented as a standalone function that depended on the
package-level variable.

libnetwork/iptables: move setting up signals into handleSignals

newConnection was setting up these listeners, but in the event we
failed to check if firewalld was running, we would error-out, and
close the connection.

Looking at b052827, all of this code
is directly related to handling signals, so let's combine this into the
handleSignals method.

libnetwork/iptables: make setupDockerZone a method

Make it a method on the firewalldConnection, which felt more natural
than being implemented as a standalone function that depended on the
package-level variable.

Also improve some error-messages to include context about the failure.

libnetwork/iptables: make firewalldInit more atomic

firewalldInit was returning an error if we failed to set up the docker
zone, but did not close the D-Bus connection. Given that we consider
firewalld to "not be usable" in case of an error, let's also close
the connection;

unable to initialize firewalld; using raw iptables instead

And return the connection on success, instead of implicitly setting the
package-level firewalld variable.

libnetwork/iptables: remove firewalldRunning package-var

Make the firewalld status a property of firewalldConnection, so that all
status is captured in the firewalld instance.

firewalldInit() already checks if firewalld is running when initializing,
in which case the firewalld variable would be nil.

While refactoring, also use an atomic.Bool; the running-status can be
updated as part of startSignalHandler(), which means that there's a
potential concurrency-issue with other code accessing the firewalld
status.

libnetwork/iptables: implement addInterface/delInterface methods

Move the implementation from AddInterfaceFirewalld and DelInterfaceFirewalld
to a method on firewalldConnection, but keeping the exported utility-functions.

libnetwork/iptables: register callbacks on firewalldConnection

Register callbacks on the firewalldConnection instance instead of a global
variable. These callbacks should only be needed if firewalld is initialized
and running, so associate them with the firewalldConnection that we created.

libnetwork/iptables: Passthrough: re-use IPVersion type

Use the existing IPVersion type to switch between ipv4 and ipv6. The IPV
type was only used for this function, and currently required callers to
map a IPVersion to IPV.

libnetwork/iptables: implement passthrough as method

Move the Passthrough implementation to a method on firewalldConnection,
and add a check if firewalld is initialized and running.

libnetwork/iptables: move firewalld helpers together

Move the exported helpers to a separate file.

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah thaJeztah added status/2-code-review area/networking Networking kind/refactor PR's that refactor, or clean-up code area/networking/firewalld Networking labels Aug 15, 2023
@thaJeztah thaJeztah force-pushed the improve_firewalldenabled branch 2 times, most recently from aa1948c to 7f9f18d Compare August 16, 2023 08:51
@thaJeztah thaJeztah marked this pull request as ready for review August 16, 2023 12:58
@thaJeztah thaJeztah force-pushed the improve_firewalldenabled branch from 7f9f18d to 28b7a65 Compare August 16, 2023 15:27
@thaJeztah thaJeztah marked this pull request as draft August 16, 2023 15:37
@thaJeztah
Copy link
Member Author

Temporarily moved to draft; I'll open some separate PRs first for some changes, to make this PR a bit smaller.

@msilveirabr
Copy link

Great rework! Hope this gets into master soon!
I'll give it a try soon

@thaJeztah thaJeztah force-pushed the improve_firewalldenabled branch 2 times, most recently from 4974c13 to 67f910b Compare August 23, 2023 17:59
@thaJeztah thaJeztah force-pushed the improve_firewalldenabled branch 2 times, most recently from 3ce72f9 to d4558b5 Compare August 24, 2023 15:24
@thaJeztah thaJeztah marked this pull request as ready for review August 24, 2023 15:26
@thaJeztah thaJeztah force-pushed the improve_firewalldenabled branch from d4558b5 to 6dd93f7 Compare August 29, 2023 19:18
Un-export Conn, as it's only used internally. Rename it to firewalldConnection,
to make clearer what it's used for, and rename the sysconn field.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Move generating the error-message to where we know what the cause is,
instead of assuming we're unable to make a D-Bus connection.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Don't consider a connection successful if firewalld is not running. If
it's not running, return a suitable error message. We only use the D-Bus
connection to communicate with firewalld, so no need to return a connection
if it's not running.

Also slightly reduce depending on the global "firewalld" variable, by passing
the connection to use to checkRunning.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make it a method on the firewalldConnection, which felt more natural
than being implemented as a standalone function that depended on the
package-level variable.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
newConnection was setting up these listeners, but in the event we
failed to check if firewalld was running, we would error-out, and
close the connection.

Looking at b052827, all of this code
is directly related to handling signals, so let's combine this into the
handleSignals method.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make it a method on the firewalldConnection, which felt more natural
than being implemented as a standalone function that depended on the
package-level variable.

Also improve some error-messages to include context about the failure.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
firewalldInit was returning an error if we failed to set up the docker
zone, but did not close the D-Bus connection. Given that we consider
firewalld to "not be usable" in case of an error, let's also close
the connection;

    unable to initialize firewalld; using raw iptables instead

And return the connection on success, instead of implicitly setting the
package-level `firewalld` variable.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Make the firewalld status a property of firewalldConnection, so that all
status is captured in the `firewalld` instance.

firewalldInit() already checks if firewalld is running when initializing,
in which case the `firewalld` variable would be nil.

While refactoring, also use an `atomic.Bool`; the running-status can be
updated as part of `startSignalHandler()`, which means that there's a
potential concurrency-issue with other code accessing the firewalld
status.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Move the implementation from AddInterfaceFirewalld and DelInterfaceFirewalld
to a method on firewalldConnection, but keeping the exported utility-functions.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Register callbacks on the firewalldConnection instance instead of a global
variable. These callbacks should only be needed if firewalld is initialized
and running, so associate them with the firewalldConnection that we created.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Use the existing IPVersion type to switch between ipv4 and ipv6. The IPV
type was only used for this function, and currently required callers to
map a IPVersion to IPV.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Move the Passthrough implementation to a method on firewalldConnection,
and add a check if firewalld is initialized and running.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Move the exported helpers to a separate file.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah thaJeztah force-pushed the improve_firewalldenabled branch from 6dd93f7 to 27067b2 Compare September 11, 2023 21:22
@thaJeztah
Copy link
Member Author

@akerouanton @neersighted ptal 🤗

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is firewalldConnection a singleton or not? It smells a lot like a singleton: it wraps a singleton shared connection to the message bus to interact with the singleton firewalld service to configure the singleton host network namespace. On the other hand, much of the refactoring aims to move away from singleton-ness. But the if fwd == nil { /* no-op */ return } checks sprinkled throughout smell singleton-esque. If we want to convert the firewalld related code to not be singleton shaped, I think we should have a clear dividing line between the singleton code and the non-singleton code so that the consuming code can be migrated piecemeal from the singleton API to the non-singleton API. That is, write firewalldConnection to be shaped and act like the final goal API, and wrap it in package-scope functions for the singleton API which implement the singleton semantics like stateful no-ops. The "no-op if uninitialized" checks should only exist in the singleton wrappers. Calling firewalldConnection methods with a nil receiver should (implicitly) nil-dereference panic, because buggy code should fail fast and loudly. Turning the calls into no-ops is a great defensive coding strategy for hiding and obscuring logic bugs.

It may be worth considering if the collection of on-reloaded callbacks should be a package-scoped singleton distinct from the firewalld connection. The callbacks are more libnetwork state than firewalld state. Come to think of it, perhaps the tracking of on-reloaded callbacks doesn't belong in the iptables package at all. "The set of actions which need to be performed whenever firewalld flushes its state and the iptables chains" is really more a property of the state of libnetwork's domain objects than anything. The firewalld connection object would still need to have a mechanism to register a callback to be invoked whenever firewalld is reloaded, but maybe the libnetwork controller or something should become responsible for fanning out that callback?

Comment on lines +187 to +188
var zone string
if err := fwd.sysObj.Call(dbusInterface+".getDefaultZone", 0).Store(&zone); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only care about whether the call succeeded, not the value.

Suggested change
var zone string
if err := fwd.sysObj.Call(dbusInterface+".getDefaultZone", 0).Store(&zone); err != nil {
if err := fwd.sysObj.Call(dbusInterface+".getDefaultZone", 0).Err != nil {

Also: #46658

)

const (
// See https://github.com/firewalld/firewalld/blob/v1.3.3/doc/xml/firewalld.dbus.xml#L49-L64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// See https://github.com/firewalld/firewalld/blob/v1.3.3/doc/xml/firewalld.dbus.xml#L49-L64
// See https://firewalld.org/documentation/man-pages/firewalld.dbus.html

rule = fmt.Sprintf("type='signal',interface='org.freedesktop.DBus',member='NameOwnerChanged',path='/org/freedesktop/DBus',sender='org.freedesktop.DBus',arg0='%s'", dbusInterface)
c.sysconn.BusObject().Call("org.freedesktop.DBus.AddMatch", 0, rule)
if !c.checkRunning() {
_ = c.conn.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close must not be called on shared connections. c.conn is assigned the connection returned from dbus.SystemBus(). dbus.SystemBus() returns a shared connection, therefore c.conn.Close() must never be called or c.conn needs to be changed to an unshared connection.

fwd.conn.Signal(fwd.signal)

// FIXME(thaJeztah): there's currently no way to terminate this goroutine.
// TODO(thaJeztah): should this be rewritten to use dbus.WithSignalHandler(), instead of a self-crafted solution?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I doubt we would want to swap out the pubsub implementation that powers conn.Signal() with our own self-crafted solution.

fwd.conn.BusObject().Call("org.freedesktop.DBus.AddMatch", 0, rule)
fwd.conn.Signal(fwd.signal)

// FIXME(thaJeztah): there's currently no way to terminate this goroutine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goroutine terminates when the fwd.signal channel is closed. The channel is closed when the dbus.Conn is closed. Therefore the lifetime of the goroutine is bounded by the lifetime of the D-Bus connection.

Comment on lines +65 to +67
// It returns an error if it's unable to establish a D-Bus connection, or
// if firewalld is not running.
func newConnection() (*firewalldConnection, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it an error for firewalld to not be running? The connection is to the system bus. With all the OnReloaded callbacks, it seems like it should be possible for us to gracefully handle the user enabling and starting firewalld after the daemon and containers have started by reacting to NameOwnerChanged signals emitted by the message bus when firewalld connects.

Comment on lines +138 to +139
// onReload executes all registered callbacks.
func (fwd *firewalldConnection) onReload() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OnReload is the name of the package-scope function to register a reload callback. Naming the function which executes the callbacks the same as the function which registers the callbacks is evil. Call it handleReload or something instead.

//
// It is a no-op if firewalld is not running or firewalldConnection is not
// initialized.
func (fwd *firewalldConnection) registerReloadCallback(callback func()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (fwd *firewalldConnection) registerReloadCallback(callback func()) {
func (fwd *firewalldConnection) OnReloaded(callback func()) {

"On firewalld reload, do thing" is clear and concise. No need to overcomplicate the naming.

Comment on lines +172 to +174
if fwd == nil {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing no-ops where there weren't previously is really stretching the definition of refactor. The calling code wanted to register a callback, period. So register the callback or panic trying.

Comment on lines +162 to +171
// Note that we're not checking if firewalld is running here,
// as the previous code did not check for this, and (technically),
// the callbacks may not require firewalld, or may be registered
// when firewalld is not running.
//
// We're assuming here that all callback can be executed if this
// onReload function is triggered (either from a D-Bus event, or
// otherwise).
//
// TODO(thaJeztah): verify if these should always be run, or only if firewalld is running at the moment the callback is registered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the callbacks should always be registered no matter the status of firewalld at the time of registration. The callbacks are run whenever firewalld forgets all its state and flushes away all the iptables rules we passed through, so that the callbacks can restore it all. Firewalld flushes all the iptables chains when it first starts up, so all firewalld reload callbacks registered while firewalld wasn't running are not to be ignored.

@corhere
Copy link
Contributor

corhere commented Oct 16, 2023

The only callers to iptables.OnReloaded are the bridge driver (to replay iptables rule insertions) and this gem (to replay iptables chain creation and rule insertion):

func setupArrangeUserFilterRule(c *Controller) {
ctrl = c
iptables.OnReloaded(arrangeUserFilterRule)
}
// arrangeUserFilterRule sets up the DOCKER-USER chain for each iptables version
// (IPv4, IPv6) that's enabled in the controller's configuration.
func arrangeUserFilterRule() {
if ctrl == nil {
return
}
for _, ipVersion := range ctrl.enabledIptablesVersions() {
if err := setupUserChain(ipVersion); err != nil {
log.G(context.TODO()).WithError(err).Warn("arrangeUserFilterRule")
}
}
}

I'm going to experiment with a PR which has the firewalld connection call into the libnetwork controller, which in turn calls into the drivers to replay the insertions.

@msilveirabr
Copy link

Hi.
The test results are from Fedora 40
I've just run my tests against docker version 26.1, and it is working as expected (firewalld nftables backend) ...
It's been a long time since I started all this, and the only difference I see is: the PR I pushed did not add the interface to the docker zone when setting

com.docker.network.bridge.enable_icc: "false"

So, although I didn't inspect what exactly the code is doing about the firewall rules it seems to be okay, but I stumbled upon Issue #43440, or similar :

root@localhost:/etc/firewalld# docker ps --all
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
root@localhost:/etc/firewalld# firewall-cmd --list-interfaces --zone=docker 
docker0
root@localhost:/etc/firewalld# firewall-cmd --reload
success
root@localhost:/etc/firewalld# firewall-cmd --list-interfaces --zone=docker 
docker0 docker_default icc icc_nointernal icc_internal noicc noicc_noint noicc_internal
root@localhost:/etc/firewalld# firewall-cmd --list-interfaces --zone=docker | cut -d " " -f 2- | tr ' ' '\n' | while read z ; do firewall-cmd --zone=docker --remove-interface=$z ; done
success
success
success
success
success
success
success
root@localhost:/etc/firewalld# systemctl restart docker
root@localhost:/etc/firewalld# firewall-cmd --list-interfaces --zone=docker 
docker0
root@localhost:/etc/firewalld# firewall-cmd --reload
success
root@localhost:/etc/firewalld# firewall-cmd --list-interfaces --zone=docker 
docker0

Another way to fix this is to restart docker, then firewalld ...

There's a delay between firewall reload the the population of the interfaces into the docker zone.
The only way to remove the interfaces is to restart docker service, not reload ( is there any new feature I'm missing or is it really related to #43440? )

The results with iptables as firewalld backend is not OK.

If desirable, I could post/upload my 2-files test suite.

Thanks for the improvements!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking/firewalld Networking area/networking Networking kind/refactor PR's that refactor, or clean-up code status/2-code-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants