Skip to content

libnetwork/osl: stop tracking neighbor entries#50088

Merged
robmry merged 4 commits intomoby:masterfrom
corhere:libn/stateless-neighbor
May 27, 2025
Merged

libnetwork/osl: stop tracking neighbor entries#50088
robmry merged 4 commits intomoby:masterfrom
corhere:libn/stateless-neighbor

Conversation

@corhere
Copy link
Contributor

@corhere corhere commented May 27, 2025

- What I did

The Namespace keeps some state for each inserted neighbor-table entry which is used to delete the entry (and any related entries) given only the IP and MAC address of the entry to delete. This state is not strictly required as the retained data is a pure function of the parameters passed to AddNeighbor(), and the kernel can inform us whether an attempt to add a neighbor entry would conflict with an existing entry. Get rid of the neighbor state in Namespace. It's just one more piece of state that can cause lots of grief if it falls out of sync with ground truth.

- How I did it
Require callers to call DeleteNeighbor() with the same aguments as they had passed to AddNeighbor(). Push the responsibility for detecting attempts to insert conflicting entries into the neighbor table onto the kernel by using (*netlink.Handle).NeighAdd() instead of
NeighSet().

- How to verify it
The overlay network driver is the only consumer of the AddNeighbor/DeleteNeighbor API so to verify it, test that overlay networks are still capable of forwarding traffic between containers on different nodes.

  1. Create a multi-node Swarm cluster.
  2. Create a user-defined overlay network.
  3. docker service create --mode global -p 80:8080 nginxdemos/hello:plain-text
  4. Exec into one of the service containers and verify that the other containers can be curl'ed by the IP address of its user-defined overlay network endpoint. Repeat with the other replicas.
  5. curl port 8080 on each Swarm node. Verify that the service is reachable. Repeat to verify that the requests are load-balanced among the replicas.

- Human readable description for the release notes

- A picture of a cute animal (not mandatory but encouraged)

corhere added 2 commits May 27, 2025 11:30
func (*Namespace) AddNeighbor is only ever called with the force
parameter set to false. Remove the parameter and eliminate dead code.

Signed-off-by: Cory Snider <csnider@mirantis.com>
Scope local variables as narrowly as possible.

Signed-off-by: Cory Snider <csnider@mirantis.com>
@corhere corhere added kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/networking Networking kind/refactor PR's that refactor, or clean-up code area/networking/d/overlay Networking labels May 27, 2025
corhere added 2 commits May 27, 2025 11:46
The isDefault and nlHandle fields are immutable once the Namespace is
constructed.

Signed-off-by: Cory Snider <csnider@mirantis.com>
The Namespace keeps some state for each inserted neighbor-table entry
which is used to delete the entry (and any related entries) given only
the IP and MAC address of the entry to delete. This state is not
strictly required as the retained data is a pure function of the
parameters passed to AddNeighbor(), and the kernel can inform us whether
an attempt to add a neighbor entry would conflict with an existing
entry. Get rid of the neighbor state in Namespace. It's just one more
piece of state that can cause lots of grief if it falls out of sync with
ground truth. Require callers to call DeleteNeighbor() with the same
aguments as they had passed to AddNeighbor(). Push the responsibility
for detecting attempts to insert conflicting entries into the neighbor
table onto the kernel by using (*netlink.Handle).NeighAdd() instead of
NeighSet().

Modernize the error messages and logging in DeleteNeighbor() and
AddNeighbor().

Signed-off-by: Cory Snider <csnider@mirantis.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request removes the in-memory tracking of neighbor entries in the namespace and pushes neighbor management entirely to the kernel. Key changes include:

  • Refactoring AddNeighbor and DeleteNeighbor to remove state management and instead rely on netlink operations.
  • Eliminating the neighbors slice from the Namespace struct.
  • Updating callers in the overlay driver to match the new function signatures.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
libnetwork/osl/neigh_linux.go Refactored neighbor add/delete functions and error messaging.
libnetwork/osl/namespace_linux.go Removed neighbor state field from the Namespace struct.
libnetwork/osl/interface_linux.go Simplified interface removal by directly using n.nlHandle.
libnetwork/drivers/overlay/peerdb.go Updated neighbor API calls to match the new function signatures.
Comments suppressed due to low confidence (2)

libnetwork/osl/neigh_linux.go:89

  • [nitpick] Consider renaming the variable 'nlnh' to a more descriptive name (e.g., 'neighEntry') to better reflect its content and improve code clarity.
nlnh, linkName, err := n.nlNeigh(dstIP, dstMac, options...)

libnetwork/osl/interface_linux.go:831

  • Confirm that accessing 'n.isDefault' without a lock remains safe; if 'n.isDefault' is mutable and accessed concurrently, consider using synchronization or atomic variables.
} else if !n.isDefault {

@corhere corhere force-pushed the libn/stateless-neighbor branch from e0d5d62 to 0d6e7cd Compare May 27, 2025 15:46
Comment on lines +23 to +26
func (n NeighborSearchError) Error() string {
var b strings.Builder
b.WriteString("neighbor entry ")
if n.present {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the only part I had a comment on in the other PR. Non-blocking (from my perspective), but reposting here for others; #50081 (comment)

Not a blocker, but was mostly looking here for trade-offs between performance-optimisation (strings.Builder) and discoverability (being able to grep for the (partial) error message); wasn't sure how much this error is in a "hot path".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually used the strings.Builder approach for readability rather than optimization. Note that there are two branches. To achieve something similar with fmt.Sprintf is not much more greppable. E.g.:

s := fmt.Sprintf("neighbor entry %s for IP %v, mac %v", map[bool]string{true: "already exists", false: "not found"}, n.ip, n.mac)
if n.linkName != "" {
        s += fmt.Sprintf(", link %v", n.linkname)
}
return s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I guess I was comparing to the old;

return fmt.Sprintf("Search neighbor failed for IP %v, mac %v, present in db:%t", n.ip, n.mac, n.present)

grepping for neighbor entry gave me many results, but perhaps there's no good option for that.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robmry robmry added this to the 28.2.0 milestone May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking/d/overlay Networking area/networking Networking kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. kind/refactor PR's that refactor, or clean-up code process/cherry-pick/25.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants