-
Notifications
You must be signed in to change notification settings - Fork 38.7k
Fix some race conditions in BanMan::DumpBanlist()
#24168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/banman.cpp
Outdated
| { | ||
| static Mutex dump_mutex; | ||
| TRY_LOCK(dump_mutex, dump_lock); | ||
| if (!dump_lock) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is is ok to refuse to dump when two threads call this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an optimization. Calling this function in the destructor guarantees that the final state will be dumped to the disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it should be mentioned in the release notes that calling the ban rpc no longer implies the stuff is written to disk and when the node crashes, the state will be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant since the recent update, I guess.
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
w0xlt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK e06d092
This PR :
. adds m_banned_mutex to prevent the m_is_dirty value from being read in a different state in BannedSetIsDirty() and SweepBanned().
. gets rid of GetBanned() so it doesn't call SweepBanned() twice and sets banmap = m_banned. This works because the m_banned_mutex is already held in this operation.
. adds the same strategy used in DumpMempool to avoid concurrent writing to disk, but adds an optimization to return early if a write is in progress.
. Since there is only m_is_dirty = true; (not = false) outside of BanMan::DumpBanlist(), the last commit makes sense to avoid inconsistent values for m_dirty in a race condition.
shaavan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Going commit wise:
-
Concurrent calls to
Write()have been prevented by "Trying" to lockdump_lockand exit if locking fails. However, as @MarcoFalke pointed out, it can lead to ban rpc calls in whichDumpBanList()could not write anything to disk, leading to potential data loss. This behavior should be appropriately documented.
Other than this, it could also lead to writing failure when called through other functions as well. What do you think about introducing a mutex for this function that allows calling DumpBanList only when weAssertLockNotHeld(new_mutex). -
SweepBanned and BannedSetIsDirty are brought under the same lock block to prevent race conditions for
m_is_dirty. I agree with this approach. -
Prevented SweepBanned to be called twice by directly setting the value of
banmap(=m_banned) instead of callingGetBanned(). I think this is an improvement over the current code. -
Modifying the code to call
SetBannedSetDirty(true);throughout the codebase prevents the risk of inconsistent value due to race conditions. I agree with this change.
|
Perhaps the same strategy used in static Mutex dump_mutex;
LOCK(dump_mutex); |
The m_is_dirty value being read in BannedSetIsDirty() can differ from the value being set in SweepBanned(), i.e., be inconsistent with a BanMan instance internal state.
Another thread can `SetBannedSetDirty(true)` while `CBanDB::Write()` call being executed. The following `SetBannedSetDirty(false)` effectively makes `m_is_dirty` flag value inconsistent with the actual `m_banned` state. Such behavior can result in data loss, e.g., during shutdown.
|
Updated e06d092 -> 99a6b69 (pr24168.01 -> pr24168.02, diff). Lock optimization dropped as an any optimization must be justified with benchmarking, at least. That was not done in this case. |
w0xlt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reACK 99a6b69
Optimization removed, but the dump behavior now is more predictable.
|
Concept ACK |
shaavan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 99a6b69
Changes since my last review:
- Changed logic of preventing concurrent write by creating a static mutex and locking it.
@hebasto
We can move on to the optimization logic after we have some benchmark results. Until then, we can stick with the current logic, which is more reliable and predictable.
luke-jr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
|
|
||
| int64_t n_start = GetTimeMillis(); | ||
| if (!m_ban_db.Write(banmap)) { | ||
| SetBannedSetDirty(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated: Is m_dirty even used on shutdown when true? Shouldn't the below log be adjusted when it fails?
99a6b69 Fix race condition for SetBannedSetDirty() calls (Hennadii Stepanov) 83c7646 Avoid calling BanMan::SweepBanned() twice in a row (Hennadii Stepanov) 33bda6a Fix data race condition in BanMan::DumpBanlist() (Hennadii Stepanov) 5e20e9e Prevent possible concurrent CBanDB::Write() calls (Hennadii Stepanov) Pull request description: This PR split from bitcoin#24097 with some additions. This makes the following switch from `RecursiveMutex` to `Mutex` a pure refactoring. See details in commit messages. ACKs for top commit: w0xlt: reACK 99a6b69 shaavan: ACK 99a6b69 Tree-SHA512: da4e7268c7bd3424491f446145f18af4ccfc804023d0a7fe70e1462baab550a5e44f9159f8b9f9c7820d2c6cb6447b63883616199e4d9d439ab9ab1b67c7201b
This PR split from #24097 with some additions. This makes the following switch from
RecursiveMutextoMutexa pure refactoring.See details in commit messages.