Skip to content

fix(nodedb): fix race between updating fast node cache and db commit#1142

Merged
mattac21 merged 8 commits into
masterfrom
ma/debug
Apr 6, 2026
Merged

fix(nodedb): fix race between updating fast node cache and db commit#1142
mattac21 merged 8 commits into
masterfrom
ma/debug

Conversation

@mattac21

@mattac21 mattac21 commented Apr 3, 2026

Copy link
Copy Markdown
Collaborator

Fixes a race condition where concurrent GetFastNode calls (from RPC queries) can repopulate the fast node cache with stale data that is about to be overridden from the tree during SaveVersion. After Commit within SaveVersion the cache is not repopulated with the correct data from the tree, causing the cache to serve that stale data to future readers. At the application level, this leads to an app hash mismatches.

To fix this, we introduce a pendingFastNodeAdditions and pendingFastNodeRemovals that store changes to the fast node cache when adding nodes via saveFastNodeUnlocked or DeleteFastNode. We then defer the addition or removal of nodes from the fast node cache until after the tree changes commit, meaning there is no period of time where we could have removed a node from the cache, then brought back up an incorrect value from the tree.

This also fixes a separate race between SaveVersion and accessing getLatestVersion that was exercised by the regression tests.

@mattac21 mattac21 marked this pull request as ready for review April 3, 2026 18:01
Comment thread immutable_tree.go
// If the tree is of the latest version and fast node is not in the tree
// then the regular node is not in the tree either because fast node
// represents live state.
if t.version == t.ndb.latestVersion {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fixing a separate race condition that was exercised during the test

Comment thread nodedb.go
Comment on lines +92 to +93
pendingFastNodeAdditions []*fastnode.Node // Fast nodes to add to cache after batch commit.
pendingFastNodeRemovals [][]byte // Fast node keys to remove from cache after batch commit.

@mattac21 mattac21 Apr 3, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new, this is the actual fix for the main race causing the app hash

Comment thread nodedb.go
// return -1, nil
}

func (ndb *nodeDB) getCachedLatestVersion() int64 {

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already an accessor for getLatestVersion, but it has the potential to do a heavy db scan on latest version being 0. The previous logic did not do this, so this lighter weight accessor was added. Again this is for the separate race condition being fixed to make the test not race, not the app hash.

@mattac21 mattac21 merged commit 146f723 into master Apr 6, 2026
8 checks passed
@mattac21 mattac21 deleted the ma/debug branch April 6, 2026 14:54
@mattac21

mattac21 commented Apr 6, 2026

Copy link
Copy Markdown
Collaborator Author

@Mergifyio backport release/v1.2.x

@mergify

mergify Bot commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

backport release/v1.2.x

✅ Backports have been created

Details

Cherry-pick of 146f723 has failed:

On branch mergify/bp/release/v1.2.x/pr-1142
Your branch is up to date with 'origin/release/v1.2.x'.

You are currently cherry-picking commit 146f723.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   immutable_tree.go
	modified:   mutable_tree.go
	modified:   nodedb_test.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   nodedb.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@mattac21

mattac21 commented Apr 6, 2026

Copy link
Copy Markdown
Collaborator Author

@Mergifyio backport release/v1.3.x

@mergify

mergify Bot commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

backport release/v1.3.x

✅ Backports have been created

Details

Cherry-pick of 146f723 has failed:

On branch mergify/bp/release/v1.3.x/pr-1142
Your branch is up to date with 'origin/release/v1.3.x'.

You are currently cherry-picking commit 146f723.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   immutable_tree.go
	modified:   mutable_tree.go
	modified:   nodedb_test.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   nodedb.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

mergify Bot pushed a commit that referenced this pull request Apr 6, 2026
mattac21 added a commit that referenced this pull request Apr 6, 2026
…(backport #1142) (#1144)

Co-authored-by: mattac21 <matt@cosmoslabs.io>
mattac21 added a commit that referenced this pull request Apr 6, 2026
…(backport #1142) (#1147)

Co-authored-by: mattac21 <matt@cosmoslabs.io>
mattac21 added a commit that referenced this pull request Apr 6, 2026
…(backport #1142) (#1144)

Co-authored-by: mattac21 <matt@cosmoslabs.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants