Skip to content

libn/networkdb: stop forging tombstone entries#50342

Merged
corhere merged 1 commit intomoby:masterfrom
corhere:libn/fix-networkdb-tombstone-bug
Jul 10, 2025
Merged

libn/networkdb: stop forging tombstone entries#50342
corhere merged 1 commit intomoby:masterfrom
corhere:libn/fix-networkdb-tombstone-bug

Conversation

@corhere
Copy link
Contributor

@corhere corhere commented Jul 7, 2025

- What I did

- How I did it
When a node leaves a network, all entries owned by that node are implicitly deleted. The other NetworkDB nodes handle the leave by setting the deleted flag on the entries owned by the left node in their local stores. This behaviour is problematic as it results in two conflicting entries with the same Lamport timestamp propagating through the cluster.

Consider two NetworkDB nodes, A, and B, which are both joined to some network. Node A in quick succession leaves the network, immediately rejoins it, then creates an entry. If Node B processes the entry-creation event first, it will add the entry to its local store then set the deleted flag upon processing the network-leave. No matter how many times B bulk-syncs with A, B will ignore the live entry for having the same timestamp as its local tombstone entry. Once this situation occurs, the only way to recover is for the entry to get updated by A with a new timestamp.

There is no need for a node to store forged tombstones for another node's entries. All nodes will purge the entries naturally when they process the network-leave or node-leave event. Simply delete the non-owned entries from the local store so there is no inconsistent state to interfere with convergence when nodes rejoin a network. Have nodes update their local store with tombstones for entries when leaving a network so that after a rapid leave-then-rejoin the entry deletions propagate to nodes which may have missed the leave event.

- How to verify it
With a new unit test.

- Human readable description for the release notes

- Fix a bug in NetworkDB which would sometimes cause entries to get stuck deleted on some of the nodes, leading to connectivity issues between containers on overlay networks.

- A picture of a cute animal (not mandatory but encouraged)

When a node leaves a network, all entries owned by that node are
implicitly deleted. The other NetworkDB nodes handle the leave by
setting the deleted flag on the entries owned by the left node in their
local stores. This behaviour is problematic as it results in two
conflicting entries with the same Lamport timestamp propagating
through the cluster.

Consider two NetworkDB nodes, A, and B, which are both joined to some
network. Node A in quick succession leaves the network, immediately
rejoins it, then creates an entry. If Node B processes the
entry-creation event first, it will add the entry to its local store
then set the deleted flag upon processing the network-leave. No matter
how many times B bulk-syncs with A, B will ignore the live entry for
having the same timestamp as its local tombstone entry. Once this
situation occurs, the only way to recover is for the entry to get
updated by A with a new timestamp.

There is no need for a node to store forged tombstones for another
node's entries. All nodes will purge the entries naturally when they
process the network-leave or node-leave event. Simply delete the
non-owned entries from the local store so there is no inconsistent state
to interfere with convergence when nodes rejoin a network. Have nodes
update their local store with tombstones for entries when leaving a
network so that after a rapid leave-then-rejoin the entry deletions
propagate to nodes which may have missed the leave event.

Signed-off-by: Cory Snider <csnider@mirantis.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Improves NetworkDB’s handling of node leaves by deleting non-owned entries instead of forging tombstones and adds a regression test for out-of-order leave/rejoin events.

  • Switches deleteNodeNetworkEntries and deleteNodeTableEntries to remove remote-owned entries outright and only mark local entries for deletion.
  • Updates LeaveNetwork to inline the new deletion logic, removing obsolete tombstone-forging behavior.
  • Adds TestLeaveRejoinOutOfOrder to verify entries don’t get stuck deleted when leave/join events and table events arrive interleaved.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tableevent_test.go New regression test for leave/join/table-event order
networkdb.go Overhauled entry-deletion logic on node/network leave
delegate.go Simplified delegate to call updated deletion logic
Comments suppressed due to low confidence (3)

libnetwork/networkdb/networkdb.go:517

  • The doc comment should be updated to reflect that this now only deletes entries owned by the specified node, rather than handling local vs remote branches.
func (nDB *NetworkDB) deleteNodeNetworkEntries(nid, node string) {

libnetwork/networkdb/networkdb.go:540

  • [nitpick] The parameter node is ambiguous; consider renaming it to ownerNode or targetNode to clarify that it refers to the entry owner.
func (nDB *NetworkDB) deleteNodeTableEntries(node string) {

libnetwork/networkdb/tableevent_test.go:227

  • The new test covers remote entry creation out of order; consider adding a complementary test to verify that local entries are correctly tombstoned and propagated when the local node leaves and rejoins.
func TestLeaveRejoinOutOfOrder(t *testing.T) {

Copy link
Contributor

@robmry robmry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - not merging as it's maintainer spotlighted, maybe there's something more to discuss later today?

@corhere corhere merged commit 0059929 into moby:master Jul 10, 2025
176 of 184 checks passed
@corhere corhere deleted the libn/fix-networkdb-tombstone-bug branch July 10, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NetworkDB does not always reliably converge

6 participants