Skip to content

Conversation

@glozow
Copy link
Member

@glozow glozow commented May 15, 2024

See #27463 for full project tracking.

This contains the first few commits of #30110, which require some thinking about thread safety in review.

  • Introduce a new m_tx_download_mutex which guards the transaction download data structures including m_txrequest, the rolling bloom filters, and m_orphanage. Later this should become the mutex guarding TxDownloadManager.
    • m_txrequest doesn't need to be guarded using cs_main anymore
    • m_recent_confirmed_transactions doesn't need its own lock anymore
    • m_orphanage doesn't need its own lock anymore
  • Adds a new ValidationInterface event, ActiveTipChanged, which is a synchronous callback whenever the tip of the active chainstate changes.
  • Flush m_recent_rejects and m_recent_rejects_reconsiderable on ActiveTipChanged just once instead of checking the tip every time AlreadyHaveTx is called. This should speed up calls to that function (no longer comparing a block hash each time) and removes the need to lock cs_main every time it is called.

Motivation:

  • These data structures need synchronization. While we are holding m_tx_download_mutex, these should hold:
    • a tx hash in m_txrequest is not also in m_orphanage
    • a tx hash in m_txrequest is not also in m_recent_rejects or m_recent_confirmed_transactions
    • In the future, orphan resolution tracking should also be synchronized. If a tx has an entry in the orphan resolution tracker, it is also in m_orphanage, and not in m_txrequest, etc.
  • Currently, cs_main is used to e.g. sync accesses to m_txrequest. We should not broaden the scope of things it locks.
  • Currently, we need to know the current chainstate every time we call AlreadyHaveTx so we can decide whether we should update it. Every call compares the current tip hash with hashRecentRejectsChainTip. It is more efficient to have a validation interface callback that updates the rejection filters whenever the chain tip changes.

@glozow glozow added the P2P label May 15, 2024
@DrahtBot
Copy link
Contributor

DrahtBot commented May 15, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK instagibbs, dergoegge, theStack, hebasto
Concept ACK mzumsande
Stale ACK vasild, achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30413 (fuzz: Version handshake by dergoegge)
  • #30233 (refactor: move m_is_inbound out of CNodeState by sr-gi)
  • #30214 (refactor: Improve assumeutxo state representation by ryanofsky)
  • #27052 (test: rpc: add last block announcement time to getpeerinfo result by LarryRuane)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@glozow glozow mentioned this pull request May 15, 2024
57 tasks
@glozow glozow force-pushed the 2024-05-txdownload-mutex branch from c280fe7 to 5ba125d Compare May 15, 2024 15:42
Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to be confident that the UpdateBlockTipSync / hashRecentRejectsChainTip would be correct and didn't miss an edge case. Rest looks good to me.

@glozow glozow force-pushed the 2024-05-txdownload-mutex branch from 5ba125d to 063fc63 Compare May 16, 2024 11:21
@glozow
Copy link
Member Author

glozow commented May 16, 2024

I'm not sure how to be confident that the UpdateBlockTipSync / hashRecentRejectsChainTip would be correct and didn't miss an edge case.

I (locally) split the commit into (1) update on validation interface callback and asserting that hashRecentRejectsChainTip is equal to the chain tip whenever we call AlreadyHaveTx + (2) removing hashRecentRejectsChainTip. I think if we see that everything runs fine with (1) it's probably correct.

I've (locally) hit a bug though, so will figure out what's wrong and then push.

@ajtowns
Copy link
Contributor

ajtowns commented May 16, 2024

I'm not sure how to be confident that the UpdateBlockTipSync / hashRecentRejectsChainTip would be correct and didn't miss an edge case.

I (locally) split the commit into (1) update on validation interface callback and asserting that hashRecentRejectsChainTip is equal to the chain tip whenever we call AlreadyHaveTx + (2) removing hashRecentRejectsChainTip. I think if we see that everything runs fine with (1) it's probably correct.

I've (locally) hit a bug though, so will figure out what's wrong and then push.

FWIW I tried something similar, and got an assertion failure in one of the mempool functional tests (maybe mempool_reorg, and thus due to an invalidateblock call?), but it wasn't reliable, it only occurred when I ran many tests in parallel, not when I reran that test on its own. The bug will only trigger if the tip is updated via some other path and AlreadyHave is called before the tip is further updated via the expected path, so (1) doesn't seem like a terribly reliable check to me.

@glozow glozow force-pushed the 2024-05-txdownload-mutex branch from 063fc63 to a1f36eb Compare May 16, 2024 14:14
@glozow
Copy link
Member Author

glozow commented May 16, 2024

FWIW I tried something similar, and got an assertion failure in one of the mempool functional tests (maybe mempool_reorg, and thus due to an invalidateblock call?)

Was it mempool_packages.py maybe? Mine tripped there on invalidateblock when I was adding UpdatedBlockTip to InvalidateBlock, and it was a lock ordering problem. My hierarchy is cs_main -> tx_download_mutex -> mempool.cs.

Hm. The alternative is to hold cs_main and tell txdownloadman the current chain tip pretty much all the time...

@ajtowns
Copy link
Contributor

ajtowns commented May 16, 2024

FWIW I tried something similar, and got an assertion failure in one of the mempool functional tests (maybe mempool_reorg, and thus due to an invalidateblock call?)

Was it mempool_packages.py maybe? Mine tripped there on invalidateblock when I was adding UpdatedBlockTip to InvalidateBlock, and it was a lock ordering problem. My hierarchy is cs_main -> tx_download_mutex -> mempool.cs.

Ah, yes, I think it was.

@instagibbs
Copy link
Member

thanks for splitting this up, next on my review docket

@glozow glozow force-pushed the 2024-05-txdownload-mutex branch from a1f36eb to 7c3fb97 Compare May 20, 2024 10:06
@glozow
Copy link
Member Author

glozow commented May 20, 2024

Rebased for #29817

Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking pretty straightforward but I'm not an expert in the current locking setup

will give another pass in a bit

@instagibbs
Copy link
Member

suggested changes were done, reviewed via git range-diff master 7c3fb97 ef8de26

Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK ef8de26be5478729328ac9d8a9ad6898351552b6

Copy link
Member Author

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased

@theStack
Copy link
Contributor

Concept ACK

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK - don't know the txrequest logic very well though.

The PR description should be updated (UpdatedBlockTipSync was renamed).

@glozow glozow changed the title locks: introduce mutex for tx download, flush rejection filters on UpdatedBlockTip locks: introduce mutex for tx download, flush rejection filters once per tip change Jul 16, 2024
glozow added 2 commits July 16, 2024 10:01
We need to synchronize between various tx download structures.
TxRequest does not inherently need cs_main for synchronization, and it's
not appropriate to lock all of the tx download logic under cs_main.
This is a synchronous callback notifying clients of all tip changes.

It allows clients to respond to a new block immediately after it is
connected. The synchronicity is important for things like
m_recent_rejects, in which a transaction's validity can change (rejected
vs accepted) when this event is processed. For example, the transaction
might have a timelock condition that has just been met. This is distinct
from something like m_recent_confirmed_transactions, in which the
validation outcome is the same (valid vs already-have), so it does not
need to be reset immediately.
@glozow glozow force-pushed the 2024-05-txdownload-mutex branch from 17d41e7 to 3f274fb Compare July 16, 2024 09:01
@glozow
Copy link
Member Author

glozow commented Jul 16, 2024

glozow added 5 commits July 16, 2024 10:21
Resetting m_recent_rejects once per block is more efficient than
comparing hashRecentRejectsChainTip with the chain tip every time we
call AlreadyHaveTx. We keep hashRecentRejectsChainTip for now to assert
that updates happen correctly; it is removed in the next commit.
This also means AlreadyHaveTx no longer needs cs_main held.
The TxOrphanage is now guarded externally by m_tx_download_mutex.
@glozow glozow force-pushed the 2024-05-txdownload-mutex branch from 3f274fb to c85acce Compare July 16, 2024 09:22
@instagibbs
Copy link
Member

reACK c85acce

reviewed via git range-diff master 17d41e7 c85acce

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK c85acce

bool HasAllDesirableServiceFlags(ServiceFlags services) const override;
bool ProcessMessages(CNode* pfrom, std::atomic<bool>& interrupt) override
EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_recent_confirmed_transactions_mutex, !m_most_recent_block_mutex, !m_headers_presync_mutex, g_msgproc_mutex);
EXCLUSIVE_LOCKS_REQUIRED(!m_peer_mutex, !m_most_recent_block_mutex, !m_headers_presync_mutex, g_msgproc_mutex, !m_tx_download_mutex);
Copy link
Member

@hebasto hebasto Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PeerManagerImpl::ProcessMessages implementation has multiple early return statements. In such a case, the Developer Notes suggest to:

Combine annotations in function declarations with run-time asserts in function definitions

i.e., AssertLockNotHeld(m_tx_download_mutex);

UPD. Same for other places, for instance, PeerManagerImpl::SendMessages.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, added in #30507

//
// Message: getdata (transactions)
//
LOCK(m_tx_download_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to hold m_tx_download_mutex for this call:

bitcoin/src/net_processing.cpp

Lines 6327 to 6328 in c85acce

if (!vGetData.empty())
MakeAndPushMessage(*pto, NetMsgType::GETDATA, vGetData);
a few lines down?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we don't, changed in #30507

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Light code-review ACK c85acce

Fwiw, I've also spun up a fresh mainnet node instance on a publicly reachable machine running this PR for the last few days (out of IBD since Friday morning) and didn't notice any problems so far.

Comment on lines 2121 to 2129
for (const auto& ptx : pblock->vtx) {
m_recent_confirmed_transactions.insert(ptx->GetHash().ToUint256());
if (ptx->HasWitness()) {
m_recent_confirmed_transactions.insert(ptx->GetWitnessHash().ToUint256());
}
}
{
for (const auto& ptx : pblock->vtx) {
m_txrequest.ForgetTxHash(ptx->GetHash());
m_txrequest.ForgetTxHash(ptx->GetWitnessHash());
}
for (const auto& ptx : pblock->vtx) {
m_txrequest.ForgetTxHash(ptx->GetHash());
m_txrequest.ForgetTxHash(ptx->GetWitnessHash());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: as a follow-up, could maybe consolidate those two loops into a single one, as they operate now under the same lock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, done in #30507

@hebasto
Copy link
Member

hebasto commented Jul 22, 2024

  • Introduce a new m_tx_download_mutex which guards the transaction download data structures including m_txrequest, the rolling bloom filters, and m_orphanage.

    • m_orphanage doesn't need its own lock anymore

In the master branch, the design of the TxOrphanage class is thread-safe for all purposes. I don't think that the fact that m_orphanage does not require additional synchronization justifies the change of TxOrphanage class design. The new m_tx_download_mutex guarantees no contention when locking TxOrphanage::m_mutex, which makes the latter cheap.

If still insisting on the TxOrphanage class change, then perhaps document that it is no longer thread-safe and requires an external synchronization?

@glozow
Copy link
Member Author

glozow commented Jul 23, 2024

In the master branch, the design of the TxOrphanage class is thread-safe for all purposes. I don't think that the fact that m_orphanage does not require additional synchronization justifies the change of TxOrphanage class design.

See #30111 (comment) for background. TxOrphanage is not a multi-purpose container. It has 1 specific purpose, and its user will need to externally synchronize it with other data structures to fulfill that purpose. It seems it would just slow things down to acquire and release a lock whenever we use it.

perhaps document that it is no longer thread-safe and requires an external synchronization?

I've opened #30507 to add documentation to txorphanage.h, as well as the other suggestions from @hebasto and @theStack (thanks!).

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c85acce, I have reviewed the code and it looks OK.

@fanquake fanquake merged commit 9607277 into bitcoin:master Jul 24, 2024
@glozow glozow deleted the 2024-05-txdownload-mutex branch July 25, 2024 09:06
fanquake added a commit that referenced this pull request Jul 25, 2024
7c29e55 m_tx_download_mutex followups (glozow)
e543c65 release m_tx_download_mutex before MakeAndPushMessage GETDATA (glozow)
bce5f37 [refactor] change ActiveTipChange to use CBlockIndex ref instead of ptr (glozow)
7cc5ac5 [doc] TxOrphanage is no longer thread-safe (glozow)
6f49548 [refactor] combine block vtx loops in BlockConnected (glozow)

Pull request description:

  Followup to #30111. Includes suggestions:
  - #30111 (comment)
  - #30111 (comment)
  - #30111 (comment)
  - #30111 (comment)
  - #30111 (comment)

ACKs for top commit:
  instagibbs:
    reACK 7c29e55
  theStack:
    re-ACK 7c29e55
  dergoegge:
    reACK 7c29e55

Tree-SHA512: 79a9002d74739367789bbc64bb1d431f4d43a25a7934231e55814c2cb6981c15ef2d8465544ae2a4fbd734d9bed6cc41b37a923938a88cb8fea139523c1e98da
@bitcoin bitcoin locked and limited conversation to collaborators Jan 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.