Fix missed notification race condition in priority scheduler by pwojcikdev · Pull Request #5007 · nanocurrency/nano-node

pwojcikdev · 2026-01-24T15:25:54Z

Fixes a race condition where notifications could be missed in the priority scheduler, causing blocks to get stuck in the pool indefinitely.

…ions with mutex

gr0vity-dev-bot · 2026-01-24T16:27:08Z

Test Results for Commit `f0e1478`

Pull Request 5007: Results
Overall Status: ❌

Test Case Results

✅ 5n4pr_conf_10k_bintree: PASS (Duration: 106s)
✅ 5n4pr_conf_10k_change: PASS (Duration: 141s)
✅ 5n4pr_conf_change_dependant: PASS (Duration: 124s)
✅ 5n4pr_conf_change_independant: PASS (Duration: 127s)
✅ 5n4pr_conf_send_dependant: PASS (Duration: 123s)
✅ 5n4pr_conf_send_independant: PASS (Duration: 122s)
✅ 5n4pr_rocks_10k_bintree: PASS (Duration: 107s)
❌ 5n4pr_rocks_10k_change: FAIL (Duration: 272s)
Log

Last updated: 2026-01-24 17:16:02 UTC

Copilot

Pull request overview

This PR refactors the priority scheduler’s pool synchronization to remove a race that could cause missed wakeups and stuck blocks, and adds an observable activation event plus a stress test to validate the fix.

Changes:

Replaced nano::locked<priority_pool> with a plain priority_pool guarded by priority::mutex, and updated all pool access sites (activate, push, contains, size/empty, run, predicate, container_info) to use the unified lock.
Introduced a batch_activated observer_set on the priority scheduler, invoked from the run loop after activations and before erasing from the pool, to allow external tracking of activated block hashes without holding the scheduler lock.
Added a priority_scheduler.stress_test that exercises asynchronous activations via the new batch_activated hook to detect missed notifications, using a dedicated worker thread_pool.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`nano/node/scheduler/priority.hpp`	Adds `observer_set`-based `batch_activated` event and switches the scheduler’s `pool` from `nano::locked<priority_pool>` to a raw `priority_pool` protected by the scheduler mutex.
`nano/node/scheduler/priority.cpp`	Adjusts all interactions with `priority_pool` to be protected by `priority::mutex`, ensures `predicate()` and the run loop observe a consistent view of the pool, and emits `batch_activated` notifications outside the scheduler lock before erasing from the pool.
`nano/core_test/priority_scheduler.cpp`	Includes `thread_pool` and adds a stress test that chains activations through the new `batch_activated` callback to validate that no pool notifications are missed under concurrent conditions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-24T18:02:29Z

nano/core_test/priority_scheduler.cpp

+	node.scheduler.priority.batch_activated.add ([&] (auto const & batch) {
+		activated_count += batch.size ();
+
+		workers.post ([&] () {
+			std::this_thread::yield (); // Increase timing variability
+
+			// Activate the next account
+			auto idx = next_to_activate.fetch_add (1);
+			if (idx < blocks.size ())
+			{
+				auto txn = node.ledger.tx_begin_read ();
+				EXPECT_TRUE (node.scheduler.priority.activate (txn, blocks[idx]->account ()));
+			}
+		});
+	});


The lambda registered with batch_activated captures activated_count, next_to_activate, blocks, and workers by reference, but these locals are destroyed before system (and thus node/scheduler.priority) at the end of the test. Because the scheduler thread may still invoke this callback until system/node are torn down, there is a window where the observer can run and dereference dangling references, leading to UB/races. To fix this, either (1) declare system (and node) before the captured state so that system is destroyed first and stops the scheduler before the captured variables are destroyed, or (2) move the captured state into a heap-allocated/shared object and capture it by value so its lifetime is decoupled from the stack frame and explicitly controlled.

pwojcikdev added 3 commits January 24, 2026 15:46

Add batch_activated event to priority scheduler

97060e8

Stress test for priority scheduler notification handling

912f587

Fix missed notification in priority scheduler by guarding pool operat…

f0e1478

…ions with mutex

pwojcikdev requested a review from Copilot January 24, 2026 17:56

Copilot started reviewing on behalf of pwojcikdev January 24, 2026 17:57 View session

Copilot AI reviewed Jan 24, 2026

View reviewed changes

pwojcikdev merged commit 3681e28 into nanocurrency:develop Jan 24, 2026
32 checks passed

pwojcikdev deleted the priority-scheduler-stress-test-2 branch January 24, 2026 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missed notification race condition in priority scheduler#5007

Fix missed notification race condition in priority scheduler#5007
pwojcikdev merged 3 commits intonanocurrency:developfrom
pwojcikdev:priority-scheduler-stress-test-2

pwojcikdev commented Jan 24, 2026

Uh oh!

gr0vity-dev-bot commented Jan 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pwojcikdev commented Jan 24, 2026

Uh oh!

gr0vity-dev-bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results for Commit f0e1478

Test Case Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gr0vity-dev-bot commented Jan 24, 2026 •

edited

Loading

Test Results for Commit `f0e1478`