libnetwork/d/overlay: simplify the encryption logic#50091
Merged
akerouanton merged 7 commits intomoby:masterfrom Jun 23, 2025
Merged
libnetwork/d/overlay: simplify the encryption logic#50091akerouanton merged 7 commits intomoby:masterfrom
akerouanton merged 7 commits intomoby:masterfrom
Conversation
The overlay driver's checkEncryption function configures the IPSec parameters for the VXLAN tunnels to peer nodes. When called with isLocal=true, it configures encryption for all peer nodes with at least one peerDB entry. Since the local peers are also included in the peerDB, it needs to filter those entries out. It does so by filtering out any peer entries whose VTEP address is equal to the current local advertise address. Trouble is, the local advertise address is not necessarily constant. The driver tries to handle this case by calling peerDBUpdateSelf() when the advertise address changes. This function iterates through the peerDB and tries to update the VTEP address for all local peer entries, but it does not actually do anything: it mutates a temporary copy of the entry which is not persisted back into the peerDB. (It used to be functional, but was broken when the peerDB was extended to use SetMatrix.) So there may be cases where local peer entries are not filtered out properly, resulting in spurious encryption parameters being programmed into the kernel. Filter out local peers when walking the peerDB by filtering on whether the entry has the isLocal flag set. Remove the no-op code which attempts to update local entries in the peerDB. No other code takes any interest in the VTEP value for isLocal peer entries. Signed-off-by: Cory Snider <csnider@mirantis.com>
The VTEP value for a peer in peerDB is only accurate for a remote peer. The VTEP for a local peer would be the driver's advertise address, which is not necessarily constant for the lifetime of the driver instance. The VTEP values persisted in the peerDB entries for local peers could be stale or missing if not kept in sync with the advertise address. And the peerDB could get polluted with duplicate entries for local peers if the advertise address was to change, as entries which differ only by VTEP are considered distinct by SetMatrix. Persisting the advertise address as the VTEP for local peers creates lots of problems that are not easy to solve. Stop persisting the VTEP for local peers in peerDB. Any code that needs to know the VTEP for local peers can look that up from the source of truth: the driver's advertise address. Use the lack of a VTEP in peerDB entries to signify local peers, making the isLocal flag redundant. Signed-off-by: Cory Snider <csnider@mirantis.com>
Drop the isLocal boolean parameters from the peerDB functions. Local
peers have vtep == netip.Addr{}.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Since it is not meaningful to add or remove encryption between the local
node and itself, the isLocal parameter is redundant. Setting up
encryption for all network peers is now invoked by calling
checkEncryption(nid, netip.Addr{}, true)
Calling checkEncryption with isLocal=true, add=false is now more
explicitly a no-op. It always was effectively a no-op, but that was not
easy to spot by inspection. In the world with the isLocal flag,
calls to checkEncryption where isLocal=true and add=false would have rIP
set to d.advertiseAddr. In other words, it was a request to remove
encryption parameters between the local peer and itself if peerDB had no
remote-peer entries for the network. So either the call would do
nothing, or it would remove encryption parameters that aren't used for
anything. Now the equivalent call always does nothing.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The setupEncryption and removeEncryption functions take several parameters, but all call sites pass the same values for all the parameters aside from remoteIP: values taken from fields of the driver struct. Refactor these functions to be methods of the driver struct and drop the redundant parameters. Signed-off-by: Cory Snider <csnider@mirantis.com>
In addition to being three functions in a trenchcoat, the checkEncryption function has a very subtle implementation which is difficult to reason about. That is not a good property for security relevant code to have. Replace two of the three calls to checkEncryption with conditional calls to setupEncryption and removeEncryption, lifting the conditional logic which was hidden away in checkEncryption into the call sites to make it easier to reason about the code. Replace the third call with a call to a new initEncryption function. Signed-off-by: Cory Snider <csnider@mirantis.com>
The (*driver).Join function does many things to set up overlay networking. One of the first things it does is call (*network).joinSandbox, which in turn calls (*driver).initSandboxPeerDB. The initSandboxPeerDB function iterates through the peer db to add entries to the VXLAN FDB, neighbor table and IPsec security association database in the kernel for all known peers on the overlay network. One of the last things the (*driver).Join function does is call (*driver).initEncryption. The initEncryption function iterates through the peer db to add entries to the IPsec security association database in the kernel for all known peers on the overlay network. But the preceding initSandboxPeerDB call already did that! The initEncryption function is redundant and can safely be removed. Signed-off-by: Cory Snider <csnider@mirantis.com>
0014e3d to
df6b405
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR simplifies the overlay driver’s encryption and peer-management logic by removing the explicit isLocal flag, refactoring peer DB operations, and introducing dedicated setupEncryption/removeEncryption methods.
- Replaced
isLocalboolean withpeerEntry.isLocal()method based onvtepvalidity - Streamlined
peerDbAdd/peerDbDeleteandpeerAdd/peerDeletesignatures - Removed
checkEncryptionand introducedsetupEncryption/removeEncryptionondriver
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| peerdb.go | Removed isLocal field, added isLocal() method, cleaned up DB walk functions |
| overlay.go | Dropped localJoinOnce and self-update on initial node join |
| joinleave.go | Updated peerAdd/peerDelete calls to use invalid vtep for local peers |
| encryption.go | Refactored encryption: removed checkEncryption, added setupEncryption and removeEncryption |
Comments suppressed due to low confidence (2)
libnetwork/drivers/overlay/encryption.go:118
- Consider adding unit tests for
setupEncryptionandremoveEncryptionto verify that encryption SAs are correctly programmed and removed under various scenarios.
func (d *driver) setupEncryption(remoteIP netip.Addr) error {
libnetwork/drivers/overlay/peerdb.go:238
- On the first peer addition,
n.endpointsmay still be empty, sosetupEncryptionwill be skipped. Consider triggering encryption setup based on theinsertedflag (returned bypeerDbAdd) rather thanlen(n.endpoints).
if n.secure && len(n.endpoints) > 0 {
| em.Lock() | ||
| indices, ok := em.nodes[remoteIP] | ||
| em.Unlock() | ||
| func (d *driver) removeEncryption(remoteIP netip.Addr) error { |
There was a problem hiding this comment.
[nitpick] Add a doc comment for removeEncryption explaining that it removes encryption parameters for a given remote IP when the last endpoint is removed.
Member
|
@akerouanton do you have time to have a look? |
akerouanton
approved these changes
Jun 23, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
- What I did
I refactored the encryption related code in the overlay driver to simplify the implementation, making it easier to understand and work on further.
- How I did it
Iteratively -- see individual commits for details.
- How to verify it
docker service create --mode global --network my-overlay nginxdemos/hello:plain-text- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)