libn/networkdb: stop forging tombstone entries#50342
Merged
corhere merged 1 commit intomoby:masterfrom Jul 10, 2025
Merged
Conversation
When a node leaves a network, all entries owned by that node are implicitly deleted. The other NetworkDB nodes handle the leave by setting the deleted flag on the entries owned by the left node in their local stores. This behaviour is problematic as it results in two conflicting entries with the same Lamport timestamp propagating through the cluster. Consider two NetworkDB nodes, A, and B, which are both joined to some network. Node A in quick succession leaves the network, immediately rejoins it, then creates an entry. If Node B processes the entry-creation event first, it will add the entry to its local store then set the deleted flag upon processing the network-leave. No matter how many times B bulk-syncs with A, B will ignore the live entry for having the same timestamp as its local tombstone entry. Once this situation occurs, the only way to recover is for the entry to get updated by A with a new timestamp. There is no need for a node to store forged tombstones for another node's entries. All nodes will purge the entries naturally when they process the network-leave or node-leave event. Simply delete the non-owned entries from the local store so there is no inconsistent state to interfere with convergence when nodes rejoin a network. Have nodes update their local store with tombstones for entries when leaving a network so that after a rapid leave-then-rejoin the entry deletions propagate to nodes which may have missed the leave event. Signed-off-by: Cory Snider <csnider@mirantis.com>
There was a problem hiding this comment.
Pull Request Overview
Improves NetworkDB’s handling of node leaves by deleting non-owned entries instead of forging tombstones and adds a regression test for out-of-order leave/rejoin events.
- Switches
deleteNodeNetworkEntriesanddeleteNodeTableEntriesto remove remote-owned entries outright and only mark local entries for deletion. - Updates
LeaveNetworkto inline the new deletion logic, removing obsolete tombstone-forging behavior. - Adds
TestLeaveRejoinOutOfOrderto verify entries don’t get stuck deleted when leave/join events and table events arrive interleaved.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tableevent_test.go | New regression test for leave/join/table-event order |
| networkdb.go | Overhauled entry-deletion logic on node/network leave |
| delegate.go | Simplified delegate to call updated deletion logic |
Comments suppressed due to low confidence (3)
libnetwork/networkdb/networkdb.go:517
- The doc comment should be updated to reflect that this now only deletes entries owned by the specified node, rather than handling local vs remote branches.
func (nDB *NetworkDB) deleteNodeNetworkEntries(nid, node string) {
libnetwork/networkdb/networkdb.go:540
- [nitpick] The parameter
nodeis ambiguous; consider renaming it toownerNodeortargetNodeto clarify that it refers to the entry owner.
func (nDB *NetworkDB) deleteNodeTableEntries(node string) {
libnetwork/networkdb/tableevent_test.go:227
- The new test covers remote entry creation out of order; consider adding a complementary test to verify that local entries are correctly tombstoned and propagated when the local node leaves and rejoins.
func TestLeaveRejoinOutOfOrder(t *testing.T) {
akerouanton
approved these changes
Jul 8, 2025
robmry
approved these changes
Jul 10, 2025
Contributor
robmry
left a comment
There was a problem hiding this comment.
LGTM - not merging as it's maintainer spotlighted, maybe there's something more to discuss later today?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
- What I did
- How I did it
When a node leaves a network, all entries owned by that node are implicitly deleted. The other NetworkDB nodes handle the leave by setting the deleted flag on the entries owned by the left node in their local stores. This behaviour is problematic as it results in two conflicting entries with the same Lamport timestamp propagating through the cluster.
Consider two NetworkDB nodes, A, and B, which are both joined to some network. Node A in quick succession leaves the network, immediately rejoins it, then creates an entry. If Node B processes the entry-creation event first, it will add the entry to its local store then set the deleted flag upon processing the network-leave. No matter how many times B bulk-syncs with A, B will ignore the live entry for having the same timestamp as its local tombstone entry. Once this situation occurs, the only way to recover is for the entry to get updated by A with a new timestamp.
There is no need for a node to store forged tombstones for another node's entries. All nodes will purge the entries naturally when they process the network-leave or node-leave event. Simply delete the non-owned entries from the local store so there is no inconsistent state to interfere with convergence when nodes rejoin a network. Have nodes update their local store with tombstones for entries when leaving a network so that after a rapid leave-then-rejoin the entry deletions propagate to nodes which may have missed the leave event.
- How to verify it
With a new unit test.
- Human readable description for the release notes
- Fix a bug in NetworkDB which would sometimes cause entries to get stuck deleted on some of the nodes, leading to connectivity issues between containers on overlay networks.- A picture of a cute animal (not mandatory but encouraged)