Skip to content

Conversation

@hebasto
Copy link
Member

@hebasto hebasto commented Jan 26, 2022

This PR split from #24097 with some additions. This makes the following switch from RecursiveMutex to Mutex a pure refactoring.

See details in commit messages.

src/banman.cpp Outdated
{
static Mutex dump_mutex;
TRY_LOCK(dump_mutex, dump_lock);
if (!dump_lock) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is is ok to refuse to dump when two threads call this function?

Copy link
Member Author

@hebasto hebasto Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an optimization. Calling this function in the destructor guarantees that the final state will be dumped to the disk.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it should be mentioned in the release notes that calling the ban rpc no longer implies the stuff is written to disk and when the node crashes, the state will be lost.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant since the recent update, I guess.

@DrahtBot
Copy link
Contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24097 (Replace RecursiveMutex m_cs_banned with Mutex, and rename it by hebasto)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK e06d092

This PR :

. adds m_banned_mutex to prevent the m_is_dirty value from being read in a different state in BannedSetIsDirty() and SweepBanned().

. gets rid of GetBanned() so it doesn't call SweepBanned() twice and sets banmap = m_banned. This works because the m_banned_mutex is already held in this operation.

. adds the same strategy used in DumpMempool to avoid concurrent writing to disk, but adds an optimization to return early if a write is in progress.

. Since there is only m_is_dirty = true; (not = false) outside of BanMan::DumpBanlist(), the last commit makes sense to avoid inconsistent values for m_dirty in a race condition.

Copy link
Contributor

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Going commit wise:

  1. Concurrent calls to Write() have been prevented by "Trying" to lock dump_lock and exit if locking fails. However, as @MarcoFalke pointed out, it can lead to ban rpc calls in which DumpBanList() could not write anything to disk, leading to potential data loss. This behavior should be appropriately documented.
    Other than this, it could also lead to writing failure when called through other functions as well. What do you think about introducing a mutex for this function that allows calling DumpBanList only when we AssertLockNotHeld(new_mutex).

  2. SweepBanned and BannedSetIsDirty are brought under the same lock block to prevent race conditions for m_is_dirty. I agree with this approach.

  3. Prevented SweepBanned to be called twice by directly setting the value of banmap (= m_banned) instead of calling GetBanned(). I think this is an improvement over the current code.

  4. Modifying the code to call SetBannedSetDirty(true); throughout the codebase prevents the risk of inconsistent value due to race conditions. I agree with this change.

@w0xlt
Copy link
Contributor

w0xlt commented Jan 28, 2022

Perhaps the same strategy used in DumpMempool would be a safer approach in 2c77427 .

static Mutex dump_mutex;
LOCK(dump_mutex);

The m_is_dirty value being read in BannedSetIsDirty() can differ from
the value being set in SweepBanned(), i.e., be inconsistent with a
BanMan instance internal state.
Another thread can `SetBannedSetDirty(true)` while `CBanDB::Write()`
call being executed. The following `SetBannedSetDirty(false)`
effectively makes `m_is_dirty` flag value inconsistent with the actual
`m_banned` state. Such behavior can result in data loss, e.g., during
shutdown.
@hebasto
Copy link
Member Author

hebasto commented Jan 28, 2022

Updated e06d092 -> 99a6b69 (pr24168.01 -> pr24168.02, diff).

Lock optimization dropped as an any optimization must be justified with benchmarking, at least. That was not done in this case.
So sorry for a premature optimization.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reACK 99a6b69

Optimization removed, but the dump behavior now is more predictable.

@jonatack
Copy link
Member

Concept ACK

Copy link
Contributor

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 99a6b69

Changes since my last review:

  • Changed logic of preventing concurrent write by creating a static mutex and locking it.

@hebasto
We can move on to the optimization logic after we have some benchmark results. Until then, we can stick with the current logic, which is more reliable and predictable.

Copy link
Member

@luke-jr luke-jr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

@maflcko maflcko merged commit eacc0e8 into bitcoin:master Jan 31, 2022

int64_t n_start = GetTimeMillis();
if (!m_ban_db.Write(banmap)) {
SetBannedSetDirty(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated: Is m_dirty even used on shutdown when true? Shouldn't the below log be adjusted when it fails?

@hebasto hebasto deleted the 220126-dump branch January 31, 2022 08:49
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Feb 1, 2022
99a6b69 Fix race condition for SetBannedSetDirty() calls (Hennadii Stepanov)
83c7646 Avoid calling BanMan::SweepBanned() twice in a row (Hennadii Stepanov)
33bda6a Fix data race condition in BanMan::DumpBanlist() (Hennadii Stepanov)
5e20e9e Prevent possible concurrent CBanDB::Write() calls (Hennadii Stepanov)

Pull request description:

  This PR split from bitcoin#24097 with some additions. This makes the following switch from `RecursiveMutex` to `Mutex` a pure refactoring.

  See details in commit messages.

ACKs for top commit:
  w0xlt:
    reACK 99a6b69
  shaavan:
    ACK 99a6b69

Tree-SHA512: da4e7268c7bd3424491f446145f18af4ccfc804023d0a7fe70e1462baab550a5e44f9159f8b9f9c7820d2c6cb6447b63883616199e4d9d439ab9ab1b67c7201b
@bitcoin bitcoin locked and limited conversation to collaborators Jan 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants