Skip to content

Conversation

@maflcko
Copy link
Member

@maflcko maflcko commented Sep 10, 2018

Take the mempool read lock during reorgs, so that we don't accidentally read an inconsistent mempool.

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 10, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #16273 (refactor: Remove unused includes by practicalswift)
  • #16194 (refactor: share blockmetadata with BlockManager by jamesob)
  • #16066 (mempool: Skip estimator if block is older than X by promag)
  • #15192 (Add missing cs_main locks in ThreadImport(...)/Shutdown(...)/gettxoutsetinfo(...)/InitScriptExecutionCache(). Add missing locking annotations. by practicalswift)
  • #13949 (Introduce MempoolObserver interface to break "policy/fees -> txmempool -> policy/fees" circular dependency by Empact)
  • #10443 (Add fee_est tool for debugging fee estimation code by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sdaftuar
Copy link
Member

sdaftuar commented Sep 14, 2018

Is the idea that you're trying to prevent RPC users from having an inconsistent state, like if they call getbestblockhash() and then call getrawmempool(), the latter should be consistent with the former?

That may be reasonable, but I think it'd be good to have some documentation (whether a project document or even a gist explaining our thinking) explaining the consistency guarantees for RPC calls with respect to the mempool, validation, and the wallet. I feel like this kind of thing has come up before in a haphazard way and caused issues when we didn't think things through or missed an implication of a change.

(Also, I think it would aid RPC users if they knew what our consistency guarantees were.)

@ryanofsky
Copy link
Contributor

Is the idea that you're trying to prevent RPC users from having an inconsistent state

From the description, I assumed the intent of the PR was to prevent RPCs like getmempoolentry, which acquire mempool.cs but not cs_main, from returning garbage or segfaulting during reorgs because UpdateMempoolForReorg and InvalidateBlock functions were apparently not acquiring mempool.cs while updating the mempool.

But actually it seems like in all the places where the mempool is updated, the code already has both cs_main and mempool.cs locks. So it should be perfectly safe to read the mempool while holding either of the two locks, and trying to acquire both is wasteful.

So maybe this PR is not doing what it intended to do, and instead it would be better just to add comments and fix thread safety annotations somehow. Maybe it would be possible to annotate cs_main and mempool.cs individually as shared locks needed for reading mempool data, and cs_main+mempool.cs both together as an exclusive lock needed for updating mempool data?

@maflcko
Copy link
Member Author

maflcko commented Sep 14, 2018

@ryanofsky With inconsistent I don't mean invalid reads, but rather transactions that are invalid (and still in the process of removal) as of our current block tip.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code utACK fa3e0471a81252e333886b97d8d33bb275061d42. Code change seems perfectly fine, and it could perhaps lead to more consistency in some cases, but like the previous comment #14193 (comment) says, it's unclear which cases are better off after this change or what case motivated this PR. More comments and documentation would be helpful.

@maflcko
Copy link
Member Author

maflcko commented Oct 27, 2018

@sdaftuar, @ryanofsky: Added a test with comments to illustrate my motivation a bit more.

@maflcko maflcko force-pushed the Mf1809-valLocksMempool branch 3 times, most recently from faecda3 to fa27735 Compare October 28, 2018 08:59
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started review:

  • dea5dbf2d5ca0177d3a4be665790288bef40e27e validation: Add missing mempool locks (1/2)
  • fa27735e0f1fd52f03a17d7e9b94a66f310ddfa1 [test] Add test to check mempool consistency in case of reorgs (2/2)

I didn't look closely at the test yet, since it seems pretty complex. If you could a write a one or two sentence comment before the test describing how it works, that might be helpful.

I also think it would be useful to have a comment on the mempool::cs variable. It's definitely being used in a way I wouldn't have expected. From first appearance, it looks like it's just being used to protect mapTx and allow maybe reads & writes from multiple threads:

bitcoin/src/txmempool.h

Lines 488 to 489 in 5d605b2

mutable CCriticalSection cs;
indexed_transaction_set mapTx GUARDED_BY(cs);

But apparently it's supposed to not just provide thread safety, but also guarantee consistency between mempool and chainActive? Would suggest maybe:

    /**
     * Lock to guarantee consistency between mempool and external chain state.
     *
     * Lock must be held whenever mempool is read from or written to and
     * whenever a block is connected. Lock can be released when the mempool
     * contains no transactions that conflict internally or externally. It
     * should be acquired after `cs_main` if both locks need to be held at the
     * same time.
     */
    mutable CCriticalSection cs;

re: #14193 (comment)

I think it'd be good to have some documentation (whether a project document or even a gist explaining our thinking) explaining the consistency guarantees for RPC calls with respect to the mempool, validation, and the wallet

Documentation was added in #14592.

@maflcko
Copy link
Member Author

maflcko commented Nov 21, 2018

The test is trivial, but looks scary because of the boilerplate overhead of the cpp unit tests. Pretty much the only thing it is doing is to check that during a reorg a (simulated) rpc thread (e.g. getrawmempool) would either return a consistent mempool as of before the reorg or a consistent mempool as of after the reorg and never something inconsistent in the middle of a reorg.

@maflcko maflcko force-pushed the Mf1809-valLocksMempool branch from fa0515e to fad0712 Compare November 28, 2018 19:03
@maflcko
Copy link
Member Author

maflcko commented Nov 28, 2018

Added a commit with further comments and documentation.

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in general seems fine to me (mempool state should update atomically after a contiguous batch of chain maintenance and not during) and the test is great, but it might be nice to:

  • explain what the motivations are. There's an argument for RPC client consistency, but IIRC this may also be a prereq for Dandelion?
  • add comments for new lock acquisitions explaining why we're grabbing a seemingly unrelated lock at a coarse grain (in line with what @ryanofsky is saying).

In practice, ActivateBestChain calls are very short and so I don't think there are any real-world downsides to locking ::mempool.cs for their duration - although I'm curious if this affects mempool access during IBD/reindex.

@maflcko maflcko force-pushed the Mf1809-valLocksMempool branch from fad0712 to fab7d54 Compare November 29, 2018 20:15
@maflcko
Copy link
Member Author

maflcko commented Nov 29, 2018

add comments for new lock acquisitions explaining why we're grabbing a seemingly unrelated lock at a coarse grain (in line with what @ryanofsky is saying).

I think I already did that. Let me know if any of them are still unclear.

@maflcko maflcko force-pushed the Mf1809-valLocksMempool branch from fab7d54 to fae71ff Compare November 30, 2018 18:59
Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK fae71ff

Just a few clarifying questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We require the pool.cs lock for the duration of this function (instead of just relying on the lock in pool.Expire) because we want to keep the mempool from changing while pcoinsTip modifications happen, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, calling LimitMempoolSize only makes sense after you have written to the mempool, in which case you already have acquired the lock and can keep it for LimitMempoolSize.

@maflcko maflcko force-pushed the Mf1809-valLocksMempool branch from fae7ec9 to 7d68982 Compare June 7, 2019 08:20
@maflcko maflcko force-pushed the Mf1809-valLocksMempool branch from 7d68982 to fa2b083 Compare June 7, 2019 09:09
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK fa2b083 [EDIT: was e284e422e75189794e24fe482819d8b1407857c3, from bad copy and paste]. Changes since last review: rebase after #15976, adding vTxHashes lock annotation, adding new commit dropping mempool lock for nTransactionsUpdated and making it atomic to avoid deadlock between mempool lock and g_best_block_mutex

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, some comments.

CTxMemPool::CTxMemPool(CBlockPolicyEstimator* estimator) :
nTransactionsUpdated(0), minerPolicyEstimator(estimator)
CTxMemPool::CTxMemPool(CBlockPolicyEstimator* estimator)
: nTransactionsUpdated(0), minerPolicyEstimator(estimator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

// is consistent. That is, it has all txs that were there
// before the reorg.
assert(::mempool.mapTx.size() == txs.size());
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fa2b083

Why continue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the thread should continue and not exit (for now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it exit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would exit when it breaks. I think this is well documented in the test case and I am not sure what you are asking exactly.

If you are asking about cpp syntax, I find cppreference really helpful. E.g. https://en.cppreference.com/w/cpp/language/break

If you are asking about this test case logic, please write a more verbose question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this either. If you have a loop, and an unconditional continue is the last statement in the loop, the continue isn't doing anything and could be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I now I understand that you were asking why I put a redundant continue there.

That was intentional, because I felt it was more clear back when I wrote the test. Happy to remove, but I think we shouldn't spend too much time on style of tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to discuss style! I honestly though something was wrong because of the pointless continue, like missing code or something else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Sorry for the confusion.

// This thread is checking that the mempool either contains all of
// the transactions invalidated by the reorg, or none of them, and
// not some intermediate amount.
while (true) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fa2b083

I think this is flawless, what guarantees that there's a non-atomic change once mempool.cs is released?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes in this pull request should guarantee that this doesn't happen here in the test (you can check it by running the test before the code changes and after), as well as for "real" rpc polling threads. Though, I didn't write a functional test for this.

@laanwj
Copy link
Member

laanwj commented Jul 2, 2019

code review ACK fa2b083

@ryanofsky didn't you ack the wrong commit above ? e284e422e75189794e24fe482819d8b1407857c3 ("Remove getBlockDepth method from Chain::interface") isn't part of this PR

@ryanofsky
Copy link
Contributor

@ryanofsky didn't you ack the wrong commit above

Sorry, bad copy and paste from the command line. Edited #14193 (review) to fix.

@laanwj
Copy link
Member

laanwj commented Jul 2, 2019

Thanks!

@laanwj laanwj merged commit fa2b083 into bitcoin:master Jul 2, 2019
laanwj added a commit that referenced this pull request Jul 2, 2019
fa2b083 [test] Add test to check mempool consistency in case of reorgs (MarcoFalke)
fabeb1f validation: Add missing mempool locks (MarcoFalke)
fa0c9db txpool: Make nTransactionsUpdated atomic (MarcoFalke)

Pull request description:

  Take the mempool read lock during reorgs, so that we don't accidentally read an inconsistent mempool.

ACKs for top commit:
  laanwj:
    code review ACK fa2b083
  ryanofsky:
    utACK fa2b083 [EDIT: was ~e284e422e75189794e24fe482819d8b1407857c3~, from bad copy and paste]. Changes since last review: rebase after #15976, adding vTxHashes lock annotation, adding new commit dropping mempool lock for nTransactionsUpdated and making it atomic to avoid deadlock between mempool lock and g_best_block_mutex

Tree-SHA512: cfe7777993589087753e000e3736d79d320dca412383fb77b56bef8946a04049722bf888c11b6f722adf677165185c7e58b4a269f7c5fa25e84dda375f6c8a7d
@maflcko maflcko deleted the Mf1809-valLocksMempool branch July 2, 2019 14:55
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 3, 2019
fa2b083 [test] Add test to check mempool consistency in case of reorgs (MarcoFalke)
fabeb1f validation: Add missing mempool locks (MarcoFalke)
fa0c9db txpool: Make nTransactionsUpdated atomic (MarcoFalke)

Pull request description:

  Take the mempool read lock during reorgs, so that we don't accidentally read an inconsistent mempool.

ACKs for top commit:
  laanwj:
    code review ACK fa2b083
  ryanofsky:
    utACK fa2b083 [EDIT: was ~e284e422e75189794e24fe482819d8b1407857c3~, from bad copy and paste]. Changes since last review: rebase after bitcoin#15976, adding vTxHashes lock annotation, adding new commit dropping mempool lock for nTransactionsUpdated and making it atomic to avoid deadlock between mempool lock and g_best_block_mutex

Tree-SHA512: cfe7777993589087753e000e3736d79d320dca412383fb77b56bef8946a04049722bf888c11b6f722adf677165185c7e58b4a269f7c5fa25e84dda375f6c8a7d
laanwj added a commit that referenced this pull request Oct 26, 2019
6b6be41 gui: Make polling in ClientModel asynchronous (João Barbosa)

Pull request description:

  After #14193 `ClientModel::updateTimer` can take some time, as such the GUI hangs, like #17112.

  Fixes this by polling in a background thread and updating the GUI asynchronously.

ACKs for top commit:
  laanwj:
    ACK 6b6be41
  Sjors:
    Code review re-ACK 6b6be41; only replaced the scary cast with `{ timer->start(); }`

Tree-SHA512: fd98b0c6535441aee3ee03c48b58b4b1f9bdd172ec6b8150da883022f719df34cabfd4c133412bf410e7f709f7bf1e9ef16dca05ef1f3689d526ceaeee51de38
jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 25, 2020
Summary:
This was originally feedback given to me on D6539. Since then I've decided to pass CTxMemPool to the addForBlock method too. This avoids having to declare an `extern CTxMemPool g_mempool` symbol on txmempool.h, and is nicer in general.

Alternatively, I've considered adding a `CTxMemPool& m_pool` to DisconnectedBlockTransactions plus an explicit constructor on the assumption that such objects can only ever have one mempool they work with, but went with this instead for simplicity.

The above changes are unrelated to [[bitcoin/bitcoin#14193 | PR14193]] however, so I'm putting them on a separate diff not to overcomplicate review of the latter.

Test Plan:
  ninja check check-functional

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D7439
jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 25, 2020
Summary:
bitcoin/bitcoin@fa0c9db

---

Depends on D7439

Partial backport of Core [[bitcoin/bitcoin#14193 | PR14193]]

Test Plan:
  ninja check-all

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D6537
jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 25, 2020
Summary:
bitcoin/bitcoin@fabeb1f

---

Depends on D6537

Partial backport of Core [[bitcoin/bitcoin#14193 | PR14193]]

Test Plan:
  cmake .. -DCMAKE_BUILD_TYPE=Debug
  ninja check-all

Reviewers: #bitcoin_abc, Fabien, deadalnix

Reviewed By: #bitcoin_abc, Fabien, deadalnix

Subscribers: deadalnix, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D6539
jasonbcox pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Sep 25, 2020
…of reorgs

Summary:
bitcoin/bitcoin@fa2b083

Depends on D6539

Concludes backport of Core [[bitcoin/bitcoin#14193 | PR14193]]

Test Plan:
  cmake -DCMAKE_BUILD_TYPE=Debug
  ninja check check-functional

Reviewers: #bitcoin_abc, deadalnix

Reviewed By: #bitcoin_abc, deadalnix

Differential Revision: https://reviews.bitcoinabc.org/D7440
LarryRuane pushed a commit to LarryRuane/zcash that referenced this pull request May 10, 2021
LarryRuane pushed a commit to LarryRuane/zcash that referenced this pull request May 10, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants