-
Notifications
You must be signed in to change notification settings - Fork 38.7k
wallet: UnregisterValidationInterface before SyncWithValidationInterfaceQueue #18280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wallet: UnregisterValidationInterface before SyncWithValidationInterfaceQueue #18280
Conversation
8e8695d to
12374e6
Compare
c1ee308 to
afe8d39
Compare
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
About to look into this but this PR description seems like it is going to take a lot of digging to understand. Is it possible to summarize in a paragraph what this change is doing, what problem it fixes, and how it compares to the alternative? Thank you! 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to help other reviewers, this seems to make three changes:
-
Instead of RegisterValidationInterface being non-blocking and beginning to send notifications right away, even if they preceded the RegisterValidationInterface call, it will now block waiting for any previously queued notifications to be drained, then attach the notification handler, then finally return.
-
Instead of UnregisterValidationInterface being non-blocking and turning off notifications right away, it waits for any notifications that were queued prior to the UnregisterValidationInterface to be handled and blocks. Then it disconnects and returns after they were handled.
-
Stops holding cs_main while sending the loadwallet notification.
Change 3 seems like a good change just because it reduces unnecessary locking, but it's not clear what motivated the change. Maybe it would fix the deadlock @fanquake alluded to in #16307 (comment), but it's not possible to to tell without a backtrace.
Change 1 at first glance seems like it more likely lead to missing notifications on startup, but maybe it's intended to prevent the wallet redundantly processing mempool transactions blocks that have just been rescanned? Unclear
Change 2 seems like it could fix missing notifications on shutdown, but I'm not sure where that's been reported. It also seems like it could slow down shutdown unnecessarily.
Note: I have a change which handles attaching notifications in a more deliberate nonracy way in #15719. Not sure if it overlaps with bugs that are being seen now, though. And it's nice to have more minimal bugfixes anyway.
src/wallet/wallet.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is no longer true, and seems like it means there can now be missing notifications. I wonder if a different approach would be to send the loadwallet notification before locking the chain & scanning instead of after?
|
Sorry @ryanofsky, should have tagged wip. Will address your comments and address each required change once this is good. Thanks for reviewing and your comments - will also revisit #15719. |
afe8d39 to
477a4b7
Compare
477a4b7 to
bba30cb
Compare
|
Updated OP. Reduced the scope to just fix #16307. Will submit other PRs with relevant changes from previous commits. |
56db247 to
2a42fc1
Compare
|
Updated to include dc62f3ce5228a611d973a154eb338e1fd34df38f which ensures that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 2a42fc1bdb393f88d72ec4d5c93b43b2b86dcc91. I left one comment below that would be nice to address. Also dc62f3ce5228a611d973a154eb338e1fd34df38f could use a better commit description. It's unclear when SyncWithValidationInterfaceQueue would hang now. Is there a shutdown race where wallets are outliving the scheduler? Or is this just a test workaround for cases where scheduler is not running?
src/scheduler.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Handle no threads serving in MaybeScheduleProcessQueue" (dc62f3ce5228a611d973a154eb338e1fd34df38f)
I think it would make more sense to add this logic at the bottom of AddToProcessQueue (and avoid calling MaybeScheduleProcessQueue in no thread case) than having it at the top of MaybeScheduleProcessQueue and returning early.
Few reasons:
- Having this logic doesn't seem useful in ProcessQueue, the other place where MaybeScheduleProcessQueue is called
- Adding this code here creates recursion ProcessQueue -> MaybeScheduleProcessQueue -> EmptyQueue -> ProcessQueue
- If code is added here the name
MaybeScheduleProcessQueueis now misleading because this is running callbacks immediately, not scheduling them for later
2a42fc1 to
c9ca6c4
Compare
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK c9ca6c407adc3bf43a968b6aecd3ea4cbec1ea54. Just suggested scheduler changes since last review
src/scheduler.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Handle no threads serving in MaybeScheduleProcessQueue" (201eebf13b2723d7fe821d7222600fa04fadc0b0)
Commit message subject is out of date since MaybeScheduleProcessQueue is no longer changing.
Also I'm surprised by "SyncWithValidationInterfaceQueue is called after
scheduler thread exists, which happen at shutdown" since I would hope wallets are getting unloaded before the scheduler stops, but maybe this is not the case currently. Probably beyond the scope of this PR to address, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since I would hope wallets are getting unloaded before the scheduler stops
Not currently the case.
Probably beyond the scope of this PR to address, though
Agree, I've tried and it caused some other failures.
Commit message subject is out of date since MaybeScheduleProcessQueue is no longer changing.
Ops need to update.
src/scheduler.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Handle no threads serving in MaybeScheduleProcessQueue" (201eebf13b2723d7fe821d7222600fa04fadc0b0)
If we can make this code:
if (m_pscheduler->AreThreadsServicingQueue()) [
MaybeScheduleProcessQueue();
} else {
EmptyQueue();
}it would be more obvious what's going on here.
Otherwise if the unconditional MaybeScheduleProcessQueue call is needed it would be good have a comment saying what it's needed for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be inside EmptyQueue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be inside
EmptyQueue
I wouldn't change emptyqueue because it's called other places which don't need this logic, and I can't see how the behavior would fit in there (how you could coherently describe it or change the emptyqueue name to reflect it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. However if the scheduler thread is stopped after AreThreadsServicingQueue then the function won't be called right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test wallet_multiwallet fails due to this code path, see my changes in EmptyQueue, you should schedule process queue when event queue is not empty, the logic is reversed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes much more sense compare to current version
void SingleThreadedSchedulerClient::EmptyQueue() {
if (!m_pscheduler->AreThreadsServicingQueue())
return;
auto pendingCallbacks = [this]() -> bool {
LOCK(m_cs_callbacks_pending);
return !m_callbacks_pending.empty();
};
bool should_continue = pendingCallbacks();
while (should_continue) {
ProcessQueue();
should_continue = pendingCallbacks();
}
}
- You don't need to do stupid things, like here one,
if (!m_pscheduler->AreThreadsServicingQueue()) - You don't need to check against is there pending callbacks it's done there, so if you don't have pending ones you don't need to enter eventually endless waiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iv'e been trying to reproduce appveyor error wihtout any luck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You don't need to do stupid things, like here one,
if (!m_pscheduler->AreThreadsServicingQueue())- You don't need to check against is there pending callbacks it's done there, so if you don't have pending ones you don't need to enter eventually endless waiting.
@bvbfan, I'm confused by what you are saying here. You seem to be saying you don't need two things, but your code sample is doing both those things. Are you saying that for some reason it is better to do (1) in EmptyQueue instead of AddToProcessQueue? If so, what is the reason you think this? To me it seems better to do (1) in AddToProcessQueue instead of EmptyQueue, because AddToProcessQueue has to be aware of thread state while EmptyQueue does not.
I'm even more confused what you are saying about (2) because you're saying we don't need a check for pending callbacks but your code is actually adding an extra check (before the first loop iteration).
More importantly, I can't figure out what this suggestion has to do with wallet_multiwallet.py. If you can clarify, it would be much appreciated. I've also been trying to reproduce the failure there and look for possible causes and haven't been able to figure it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by what you are saying here. You seem to be saying you don't need two things, but your code sample is doing both those things.
I mean you don't need to do these things outside else, i was not clear. These 2 things should be done here in EmptyQueue
More importantly, I can't figure out what this suggestion has to do with wallet_multiwallet.py
if (m_pscheduler->AreThreadsServicingQueue()) [
MaybeScheduleProcessQueue();
} else {
EmptyQueue();
}
This code is wrong, you try to process / flush events but it does not do that. Let analyse current code
void SingleThreadedSchedulerClient::EmptyQueue() {
assert(!m_pscheduler->AreThreadsServicingQueue()); // <- assert (you can avoid that)
bool should_continue = true;
while (should_continue) {
ProcessQueue(); // <--- here is the bigger mistake, it tries to process events before it's checked that has pending callbacks, you can enter this and wait to infinity (since app is shutting down) it should be checked for pendings before process queue
LOCK(m_cs_callbacks_pending);
should_continue = !m_callbacks_pending.empty();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProcessQueue(); // <--- here is the bigger mistake, it tries to process events before it's checked that has pending callbacks, you can enter this and wait to infinity (since app is shutting down) it should be checked for pendings before process queue
Explain this more? ProcessQueue doesn't wait for callbacks if there aren't any, it just runs a callbacks if there is one.
Iv'e been trying to reproduce appveyor error wihtout any luck.
I finally reproduced the problem cutting down the test and running in a loop with --usecli. Looking at the stack trace I see unloadwallet RPC thread waiting for scheduler thread MaybeCompactWalletDB call, which is stuck in ReleaseWallet calling SyncWithValidationInterfaceQueue, which is stuck because SyncWithValidationInterfaceQueue will always hang if called from the scheduler thread. This is basically the same deadlock that was in fanquake's backtrace from #16307 (comment), which suggests that 2a42fc1bdb393f88d72ec4d5c93b43b2b86dcc91 and 2097e35 aren't going to work as fixes for this problem in their current form. Maybe more can be done to get them working but an alternate approach like 06d2b53 (branch, comment) might be easier at this point
This is necessary if SyncWithValidationInterfaceQueue is called after scheduler thread exists, which happen at shutdown.
c9ca6c4 to
2097e35
Compare
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 2097e35. Just suggested change clarifying AddToProcessQueue / MaybeScheduleProcessQueue logic since last review
| m_callbacks_pending.emplace_back(std::move(func)); | ||
| } | ||
| MaybeScheduleProcessQueue(); | ||
| if (m_pscheduler->AreThreadsServicingQueue()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Handle no threads serving in MaybeScheduleProcessQueue" (03cd591)
Was experimenting and wrote a small test for this change:
// Ensure scheduled tasks run even when there are no scheduler threads
BOOST_AUTO_TEST_CASE(singlethreadedscheduler_nothreads)
{
CScheduler scheduler;
SingleThreadedSchedulerClient client(&scheduler);
bool task_ran = false;
client.AddToProcessQueue([&] { task_ran = true; });
BOOST_CHECK(task_ran);
}Could add this to scheduler_tests.cpp. It fails without this commit and passes with it.
|
Replaced by #18338. |
From boost signal documentation at https://www.boost.org/doc/libs/1_72_0/doc/html/signals2/thread-safety.html:
This means that
UnregisterValidationInterfacedoesn't prevent more calls to that interface.This PR fixes the assumption that no validation calls happen after
UnregisterValidationInterfacein the following codebitcoin/src/wallet/wallet.cpp
Lines 114 to 115 in 5d92ac2
The actual bug is that
delete walletraces with the validation notifications.The fix consists in synchronizing with the validation queue (which happens in
BlockUntilSyncedToCurrentChain) afterUnregisterValidationInterface.Alternative to #18279. Fixes #16307.