libnetwork/d/overlay: fix logical race conditions#50260
Conversation
There was a problem hiding this comment.
Pull Request Overview
Improve overlay driver concurrency by introducing per-network locking, integrating peer databases into network structs, and synchronizing encryption state updates.
- Added
countmaputility for tracking and auto-removing zero counts, with unit tests. - Refactored peer database (
peerdb.go), moving it into eachnetworkand replacing global locks with per-networkmu. - Introduced
driver.muas the bottom-level lock, implementedlockNetworkfor safe network operations, and synchronized encryption viaencrMap.mu.
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| libnetwork/internal/countmap/countmap.go | New generic Map[T] with Add that auto-removes zero entries |
| libnetwork/internal/countmap/countmap_test.go | Tests for Map.Add, verifying count changes and removal |
| libnetwork/drivers/overlay/peerdb.go | Moved peer db into peerMap, removed global lock, refactored Walk, Get, Add, Delete |
| libnetwork/drivers/overlay/overlay.go | Reorganized driver struct, introduced bottom-level mu, replaced old locks |
| libnetwork/drivers/overlay/ov_network.go | Added network.mu, integrated peerdb and fdbCnt, implemented lockNetwork, DCL in creation |
| libnetwork/drivers/overlay/ov_endpoint.go | Switched to lockNetwork for endpoint ops, removed manual lock helpers |
| libnetwork/drivers/overlay/joinleave.go | Updated Join/Leave/EventNotify to use per-network lock and new peer db API |
| libnetwork/drivers/overlay/encryption.go | Added encrMap.mu, refactored encryption key management, synchronized SA/Policy updates |
Comments suppressed due to low confidence (3)
libnetwork/drivers/overlay/encryption.go:286
- [nitpick] The parameter name
spishadows the typespi, which can be confusing; consider renaming the parameter to e.g.sporidx.
func programSA(localIP, remoteIP net.IP, spi spi, k *key, dir int, add bool) (fSA *netlink.XfrmState, rSA *netlink.XfrmState, lastErr error) {
libnetwork/drivers/overlay/ov_network.go:243
- Missing space after colon and use
%wfor wrapping: change to"could not delete neighbor entry in the sandbox: %w".
"network_id": n.id,
libnetwork/drivers/overlay/peerdb.go:12
- The import "syscall" is unused after the refactoring and will cause a compile error; please remove it.
"syscall"
5aef59b to
b15a542
Compare
b15a542 to
d3c2d7b
Compare
thaJeztah
left a comment
There was a problem hiding this comment.
Looks good! I left some thoughts / ideas; happy to hear your thoughts!
| if n.secure && len(d.keys) == 0 { | ||
| return errors.New("cannot join secure network: encryption keys not present") | ||
| if n.secure { | ||
| d.secMap.mu.Lock() |
There was a problem hiding this comment.
I guess to some extent, secMap could follow the same pattern, but mostly to not have to "dive into it's internals to know about the lock"), so not a strong objection / issue.
sm, release := d.lockSecMap()
defer release()
sm.keys = append(....)
// other thingsThere was a problem hiding this comment.
secMap doesn't have any methods (aside from implementing the Stringer interface) -- its internals and its interface are one and the same! There is no avoiding "diving into its internals" so I see little benefit to adding boilerplate around locking a mutex.
| @@ -172,8 +203,6 @@ func (d *driver) CreateNetwork(ctx context.Context, id string, option map[string | |||
| } | |||
There was a problem hiding this comment.
Unrelated, but curious if the TODO above still holds
moby/libnetwork/drivers/overlay/ov_network.go
Lines 169 to 172 in 8652cf6
Looks like there was a n.writeToStore() at the time; b64997e but I don't see that anymore;
moby/libnetwork/drivers/overlay/ov_network.go
Lines 212 to 214 in b64997e
There was a problem hiding this comment.
The writeToStore call was removed in 0fa873c. The TODO comment is no longer relevant.
There was a problem hiding this comment.
d3c2d7b to
aa88168
Compare
thaJeztah
left a comment
There was a problem hiding this comment.
LGTM
Left a comment about the walkNodes method; non-blocking, but let me know if they make sense.
aa88168 to
f9daf69
Compare
thaJeztah
left a comment
There was a problem hiding this comment.
still LGTM, thx!
@robmry @akerouanton ptal
robmry
left a comment
There was a problem hiding this comment.
LGTM - just nits/questions ...
It is easier to find all references when they are struct fields rather than embedded structs. Signed-off-by: Cory Snider <csnider@mirantis.com>
func (*driver) secMapWalk is a curious beast. It is named walk, yet it also mutates the collection being iterated over. It returns an error, but that error is always nil. It takes a callback that can break iteration, yet the only caller makes no use of that affordance. Its utility is limited and the abstraction hinders readability more than it helps. Open-code the d.secMap.nodes loop into func (*driver) updateKeys(), the only caller. Signed-off-by: Cory Snider <csnider@mirantis.com>
f9daf69 to
89b6227
Compare
There is a dedicated mutex for synchronizing access to the encrMap. Separately, the main driver mutex is used for synchronizing access to the encryption keys. Their use is sufficient to prevent data races (if used correctly, which is not the case) but not logical race conditions. Programming the encryption parameters for a peer can race with encryption keys being updated, which could lead to inconsistencies between the parameters programmed into the kernel and the desired state. Introduce a new mutex for synchronizing encryption operations. Use that mutex to synchronize access to both encrMap and keys. Handle encryption key updates in a critical section so they can no longer be interleaved with kernel programming of encryption parameters. Signed-off-by: Cory Snider <csnider@mirantis.com>
The concurrency control in the overlay driver is logically unsound. While the use of mutexes is sufficient to prevent data races -- violations of the Go memory model -- many operations which need to be atomic are performed with unbounded concurrency. Overhaul the use of locks in the overlay network driver. Implement sound locking at the network granularity: operations may proceed concurrently iff they are being applied to distinct networks. Push the responsibility of locking up to the code which calls methods or accesses struct fields to avoid deadlock situations like we had previously with d.initSandboxPeerDB() and to make the code easier to reason about. Each overlay network has a distinct peer db. The NetworkDB watch for the overlay peer table for the network will only start after (*driver).CreateNetwork returns and will be stopped before libnetwork calls (*driver).DeleteNetwork, therefore the lifetime of the peer db for a network is constrained to the lifetime of the network itself. Yet the peer db for a network is tracked in a dedicated map, separately from the network objects themselves. This has resulted in a parallel set of mutexes to manage concurrency of the peer db distinct from the mutexes for the driver and networks. Move the peer db for a network into a field of the network struct and guard it from concurrent access using the per-network lock. Move the methods for manipulating the peer db into the network struct so that the methods can only be called if the caller has a reference to the network object. Network creation and deletion are synchronized using the driver-scope mutex, but some of the kernel programming is performed outside of the critical section. It is possible for network deletion to race with recreating the network, interleaving the kernel programming for the network creation and deletion, resulting in inconsistent kernel state. Parallelize network creation and deletion soundly. Use a double-checked locking scheme to soundly handle the case of concurrent CreateNetwork and DeleteNetwork for the same network id without blocking operations on other networks. Synchronize operations on a network so that operations on the network such as adding a neighbor to the peer db are performed atomically, not interleaved with deleting the network. Signed-off-by: Cory Snider <csnider@mirantis.com>
89b6227 to
89d3419
Compare
|
I have a few pending comments but I need to take another look tomorrow. |
akerouanton
left a comment
There was a problem hiding this comment.
Beside my two comments, LGTM.
| if etype == driverapi.Delete { | ||
| opname = "delete" | ||
| err = n.peerDelete(eid, addr, mac, vtep) | ||
| } else { |
There was a problem hiding this comment.
Create and Update are treated the same. Could you add a comment explaining why?
There was a problem hiding this comment.
That's the way it was before I started refactoring. 🤷
I only touched this part of the code to refactor the error handling. The control flow is the same as it was before.
moby/libnetwork/drivers/overlay/joinleave.go
Lines 196 to 201 in aaf3e76
Come to think of it, it is likely subtly incorrect to handle creates the same as updates. While the driver never performs an update operation on an overlay_peer_table entry (libnetwork doesn't afford a mechanism for drivers to do so) it could still be problematic if it is possible for the same endpoint ID to leave and rejoin a network, as the DELETE and re-CREATE could be coalesced into a single UPDATE if a peer receives the CREATE event first. This might be one of the root causes of #49908! I should fix that.
| defer func() { | ||
| d.mu.Unlock() | ||
| if doPeerFlush { | ||
| d.peerFlush(nid) |
There was a problem hiding this comment.
We don't flush anymore — is it unneeded in the end? Or replaced by something else?
There was a problem hiding this comment.
It's been replaced by delete(d.networks, nid) as the peerdb for a network is now a field on the network struct, not a key in another map that needs to be cleaned up explicitly.
- What I did
- How I did it
The concurrency control in the overlay driver is logically unsound. While the use of mutexes is sufficient to prevent data races -- violations of the Go memory model -- many operations which need to be atomic are performed with unbounded concurrency. For instance, programming the encryption parameters for a peer can race with encryption keys being updated, which could lead to inconsistencies between the parameters programmed into the kernel and the desired state.
Overhaul the use of locks in the overlay network driver. Implement sound locking at the network granularity: operations may proceed concurrently iff they are being applied to distinct networks. Push the responsibility of locking up to the code which calls methods or accesses struct fields to avoid deadlock situations like we had previously with d.initSandboxPeerDB() and to make the code easier to reason about.
Each overlay network has a distinct peer db. The NetworkDB watch for the overlay peer table for the network will only start after (*driver).CreateNetwork returns and will be stopped before libnetwork calls (*driver).DeleteNetwork, therefore the lifetime of the peer db for a network is constrained to the lifetime of the network itself. Yet the peer db for a network is tracked in a dedicated map, separately from the network objects themselves. This has resulted in a parallel set of mutexes to manage concurrency of the peer db distinct from the mutexes for the driver and networks. Move the peer db for a network into a field of the network struct and guard it from concurrent access using the per-network lock. Move the methods for manipulating the peer db into the network struct so that the methods can only be called if the caller has a reference to the network object.
Network creation and deletion is synchronized using the driver-scope mutex, but some of the kernel programming is performed outside of the critical section. It is possible for network deletion to race with recreating the network, interleaving the kernel programming for the network creation and deletion, resulting in inconsistent kernel state. Parallelize network creation and deletion soundly. Use a double-checked locking scheme to soundly handle the case of concurrent CreateNetwork and DeleteNetwork for the same network id without blocking operations on other networks. Synchronize operations on a network so that operations on the network such as adding a neighbor to the peer db are performed atomically, not interleaved with deleting the network.
Handle encryption-key updates in a critical section so they can no longer be interleaved with kernel programming of encryption parameters.
- How to verify it
docker service create --mode global --network my-overlay -p 80:8080 nginxdemos/hello:plain-text- Human readable description for the release notes
- Improve the reliability of the overlay network driver- A picture of a cute animal (not mandatory but encouraged)