mining: add cooldown to createNewBlock() immediately after IBD#34184
mining: add cooldown to createNewBlock() immediately after IBD#34184ryanofsky merged 3 commits intobitcoin:masterfrom
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34184. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
Possible places where named args for integral literals may be used (e.g.
Possible places where comparison-specific test macros should replace generic comparisons:
2026-02-20 15:51:38 |
4a0b81d to
a73a5b6
Compare
62524ac to
e8b01c4
Compare
|
Switched to avoid mock time for the cooldown mechanism (it's possible, but would require additional refactoring). Added a constraint to ignore alternative header branches. This probably doesn't matter in real life, but it does in tests. E.g. the assume_utxo test "Check importing a snapshot where current chain-tip is not an ancestor of the snapshot block but has less work" creates an alternative chain of blocks despite having better headers (but not their blocks). In real life this implies a giant block witholding attack. |
|
ACK e8b01c4 Tested on macOS - Code review:
Nice fix for #33994 |
e8b01c4 to
c02c283
Compare
|
Rebased. |
|
|
||
| WAIT_LOCK(kernel_notifications.m_tip_block_mutex, lock); | ||
| kernel_notifications.m_tip_block_cv.wait_until(lock, cooldown_deadline, [&]() EXCLUSIVE_LOCKS_REQUIRED(kernel_notifications.m_tip_block_mutex) { | ||
| const auto tip_block = kernel_notifications.TipBlock(); |
There was a problem hiding this comment.
Runtime assertion for consistency with WaitAndCreateNewBlock pattern.
| const auto tip_block = kernel_notifications.TipBlock(); | |
| AssertLockHeld(kernel_notifications.m_tip_block_mutex); | |
| const auto tip_block = kernel_notifications.TipBlock(); |
There was a problem hiding this comment.
There's other places where we don't bother with AssertLockHeld for an inline function. It seems a bit overkill.
c02c283 to
9817657
Compare
ryanofsky
left a comment
There was a problem hiding this comment.
Code review ACK 9817657. This seems like probably the simplest change that can resolve #33994.
I guess I have two lingering concerns about it:
-
This can cause the createNewBlock function to block for potentially a very long time when it didn't before and createNewBlock doesn't currently have any cancellation option other than shutting down the node. So it could make it hard for IPC clients to disconnect cleanly, and I suspect we may want to follow up here by adding some timeout or cancellation mechanism.
-
I'm not very sure about effectiveness of the cooldown strategy, and wouldn't be surprised if there were cases where it didn't work well but it does seem like a good starting point, and pretty conservative.
9817657 to
b57da68
Compare
…le schema change) Adding a context parameter ensures that these methods are run in their own thread and don't block other calls. They were missing for: - createNewBlock() - checkBlock() The missing parameters were first pointed out by plebhash in bitcoin#33575 (comment) and adding them should prevent possible performance problems and lockups, especially with bitcoin#34184 which can make the createNewBlock method block for a long time before returning. It would be straightforward to make this change in a backward compatible way (bitcoin#34184 (comment)) but nice to not need to go through the trouble. Warning: This is an intermediate, review-only commit. Binaries built from it should not be distributed or used to connect to other clients or servers. It makes incompatible changes to the `mining.capnp` schema without updating the `Init.makeMining` version, causing binaries to advertise support for a schema they do not actually implement. Mixed versions may therefore exchange garbage requests/responses instead of producing clear errors. The final commit in this series bumps the mining interface number to ensure mismatches are detected. git-bisect-skip: yes Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
Both waitTipChanged() and createNewBlock() can take a long time to return. Add a way for clients to interrupt them. The new m_interrupt_mining is safely accessed with a lock on m_tip_block_mutex, but it has no guard annotation. A more thorough solution is discussed here: bitcoin#34184 (comment)
6428ed1 to
1e02888
Compare
|
Rebased after #34568 landed, ready for review again. Should be tagged for v31. |
At startup, if the needs to catch up, connected mining clients will receive a flood of new templates as new blocks are connected. Fix this by adding a cooldown argument to createNewBlock(). When set to true, block template creation is briefly paused while the best header chain is ahead of the tip. This wait only happens when the best header extends the current tip, to ignore competing branches. Additionally, cooldown waits for isInitialBlockDownload() to latch to false, which happens when there is less than a day of blocks left to sync. When cooldown is false createNewBlock() returns immediately. The argument is optional, because many tests are negatively impacted by this mechanism, and single miner signets could end up stuck if no block was mined for a day. The getblocktemplate RPC also opts out, because it would add a delay to each call. Fixes bitcoin#33994
Both waitTipChanged() and createNewBlock() can take a long time to return. Add a way for clients to interrupt them. The new m_interrupt_mining is safely accessed with a lock on m_tip_block_mutex, but it has no guard annotation. A more thorough solution is discussed here: bitcoin#34184 (comment)
1e02888 to
fcaec25
Compare
|
Updated PR description to remove reference to the merged #34568. Adjusted |
…le schema change) Adding a context parameter ensures that these methods are run in their own thread and don't block other calls. They were missing for: - createNewBlock() - checkBlock() The missing parameters were first pointed out by plebhash in bitcoin#33575 (comment) and adding them should prevent possible performance problems and lockups, especially with bitcoin#34184 which can make the createNewBlock method block for a long time before returning. It would be straightforward to make this change in a backward compatible way (bitcoin#34184 (comment)) but nice to not need to go through the trouble. Warning: This is an intermediate, review-only commit. Binaries built from it should not be distributed or used to connect to other clients or servers. It makes incompatible changes to the `mining.capnp` schema without updating the `Init.makeMining` version, causing binaries to advertise support for a schema they do not actually implement. Mixed versions may therefore exchange garbage requests/responses instead of producing clear errors. The final commit in this series bumps the mining interface number to ensure mismatches are detected. git-bisect-skip: yes Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
| header_only_peer.peer_disconnect() | ||
| self.connect_nodes(0, 1) | ||
| self.sync_all() | ||
|
|
There was a problem hiding this comment.
In commit "mining: add cooldown argument to createNewBlock()" (a11297a)
The existing test verifies that createNewBlock() blocks while headers are ahead of the tip.
This proposed follow-up test the complementary property, that it wakes up promptly once submitblock advances the tip, rather than waiting for the cooldown timer to expire on its own.
index 5095583a0e..067e2d9217 100755
--- a/test/functional/interface_ipc_mining.py
+++ b/test/functional/interface_ipc_mining.py
@@ -171,6 +171,32 @@ class IPCMiningTest(BitcoinTestFramework):
# spurious failures.
assert_greater_than_or_equal(time.time() - start, 0.9)
+ self.log.debug("createNewBlock() should wake up immediately after sync")
+ results = {}
+
+ async def create_block_with_timing():
+ bg_start = time.time()
+ res = await mining.createNewBlock(ctx, self.default_block_create_options)
+ results['duration'] = time.time() - bg_start
+ results['success'] = res._has("result")
+
+ task = asyncio.create_task(create_block_with_timing())
+ await asyncio.sleep(0.5)
+ assert not task.done(), "createNewBlock() returned early — cooldown did not engage"
+ block_hex = self.nodes[1].getblock(node1_block_hash, False)
+ self.nodes[0].submitblock(block_hex)
+
+ try:
+ await asyncio.wait_for(task, timeout=10.0)
+ except asyncio.TimeoutError:
+ task.cancel()
+ raise AssertionError("createNewBlock() did not wake up within 10s after submitblock")
+
+ assert_equal(results['success'], True)
+ assert_greater_than_or_equal(results['duration'], 0.5)
+There was a problem hiding this comment.
@enirox001 would you like to open a PR with this test and the other documentation followup?
There was a problem hiding this comment.
Yes, I'll put that together.
As @ryanofsky suggested in #34422, I'm going to combine this test addition with a quick refactor of make_mining_ctx to keep the test suite clean and avoid duplication. I'll open the follow-up PR shortly
| return std::make_unique<BlockTemplateImpl>(assemble_options, BlockAssembler{chainman().ActiveChainstate(), context()->mempool.get(), assemble_options}.CreateNewBlock(), m_node); | ||
| } | ||
|
|
||
| void interrupt() override |
There was a problem hiding this comment.
In commit "mining: add interrupt()" (1e82fa4)
nano nit: The current naming is quite similar to interruptWait(). Explicitly documenting that this specifically targets the createNewBlock wait loop clarifies that these control two different internal primitives.
There was a problem hiding this comment.
Instead of documenting, we should flip it the other way around.
interrupt, which is the generic function that toggles the boolean value.
Then interruptWait for waitNext, and interruptMining for Mining.
There was a problem hiding this comment.
Sounds reasonable. Ultimately though I'd like to drop all these separate interrupt methods and instead add interfaces::Interrupt C++ arguments to the waiting methods that would let them detect and handle cancellations done with cap'n proto's native cancellation mechanism
| // is useful in general, it's gated behind the cooldown argument, | ||
| // because on regtest and single miner signets this would wait | ||
| // forever if no block was mined in the past day. | ||
| while (chainman().IsInitialBlockDownload()) { |
There was a problem hiding this comment.
In "mining: add cooldown argument to createNewBlock()" a11297a
IIUC, when a chainstate is loaded via loadtxoutset, the behaviour is that you keep connecting on top of the active loaded chainstate and cooldown based on it.
| return std::make_unique<BlockTemplateImpl>(assemble_options, BlockAssembler{chainman().ActiveChainstate(), context()->mempool.get(), assemble_options}.CreateNewBlock(), m_node); | ||
| } | ||
|
|
||
| void interrupt() override |
There was a problem hiding this comment.
Instead of documenting, we should flip it the other way around.
interrupt, which is the generic function that toggles the boolean value.
Then interruptWait for waitNext, and interruptMining for Mining.
| ---------- | ||
|
|
||
| - `Mining.createNewBlock` now has a `cooldown` behavior (enabled by default) | ||
| that waits for IBD to finish and for the tip to catch up. This usually |
There was a problem hiding this comment.
In fcaec25 "doc: release note for IPC cooldown and interrupt"
nit: No guarantee here, mention that it opportunistically waits for IBD to finish.
Sync capnp proto files with Bitcoin Core master (e09b81638ba1). bitcoin/bitcoin#33819 (mining: getCoinbase() returns struct instead of raw tx) mining.capnp: - BlockTemplate.getCoinbaseTx now returns CoinbaseTx struct (was Data) - Added CoinbaseTx struct with fields: tx, txFee, totalFee, blockReservedWeight, coinbaseOutputMaxAdditionalSigops, defaultSignetChallenge, defaultWitnessCommitment - Added constants: maxMoney, maxDouble, defaultBlockReservedWeight, defaultCoinbaseOutputMaxAdditionalSigops - Added default values to all struct fields (BlockTemplateOptions, Amount, CoinbaseTx) bitcoin/bitcoin#34568 (mining: Break compatibility with existing IPC mining clients) init.capnp: - makeMining renumbered from @2 to @3 - Added deprecated makeMiningOld2 @2 placeholder (Cap'n Proto requires sequential ordinals, so this cannot be removed) mining.capnp: - Mining: removed getCoinbaseCommitment and getWitnessCommitmentIndex - Mining: added interrupt @6 - BlockTemplate: methods renumbered (@5 through @9) bitcoin/bitcoin#34184 (mining: add cooldown to createNewBlock() immediately after IBD) mining.capnp: - Mining.createNewBlock: added context @3, cooldown @4 params - Mining.checkBlock: added context @2 param - Mining.createNewBlock: timeout now has default value (60000000) Also adds make_mining_old2_rejected integration test verifying that the server returns an error for the deprecated method.
Sync capnp proto files with Bitcoin Core master (e09b81638ba1). bitcoin/bitcoin#33819 (mining: getCoinbase() returns struct instead of raw tx) mining.capnp: - BlockTemplate.getCoinbaseTx now returns CoinbaseTx struct (was Data) - Added CoinbaseTx struct with fields: tx, txFee, totalFee, blockReservedWeight, coinbaseOutputMaxAdditionalSigops, defaultSignetChallenge, defaultWitnessCommitment - Added constants: maxMoney, maxDouble, defaultBlockReservedWeight, defaultCoinbaseOutputMaxAdditionalSigops - Added default values to all struct fields (BlockTemplateOptions, Amount, CoinbaseTx) bitcoin/bitcoin#34568 (mining: Break compatibility with existing IPC mining clients) init.capnp: - makeMining renumbered from @2 to @3 - Added deprecated makeMiningOld2 @2 placeholder (Cap'n Proto requires sequential ordinals, so this cannot be removed) mining.capnp: - Mining: removed getCoinbaseCommitment and getWitnessCommitmentIndex - Mining: added interrupt @6 - BlockTemplate: methods renumbered (@5 through @9) bitcoin/bitcoin#34184 (mining: add cooldown to createNewBlock() immediately after IBD) mining.capnp: - Mining.createNewBlock: added context @3, cooldown @4 params - Mining.checkBlock: added context @2 param - Mining.createNewBlock: timeout now has default value (60000000) Also adds make_mining_old2_rejected integration test verifying that the server returns an error for the deprecated method.
Sync capnp proto files with Bitcoin Core master (e09b81638ba1). bitcoin/bitcoin#33819 (mining: getCoinbase() returns struct instead of raw tx) mining.capnp: - BlockTemplate.getCoinbaseTx now returns CoinbaseTx struct (was Data) - Added CoinbaseTx struct with fields: tx, txFee, totalFee, blockReservedWeight, coinbaseOutputMaxAdditionalSigops, defaultSignetChallenge, defaultWitnessCommitment - Added constants: maxMoney, maxDouble, defaultBlockReservedWeight, defaultCoinbaseOutputMaxAdditionalSigops - Added default values to all struct fields (BlockTemplateOptions, Amount, CoinbaseTx) bitcoin/bitcoin#34568 (mining: Break compatibility with existing IPC mining clients) init.capnp: - makeMining renumbered from @2 to @3 - Added deprecated makeMiningOld2 @2 placeholder (Cap'n Proto requires sequential ordinals, so this cannot be removed) mining.capnp: - Mining: removed getCoinbaseCommitment and getWitnessCommitmentIndex - Mining: added interrupt @6 - BlockTemplate: methods renumbered (@5 through @9) bitcoin/bitcoin#34184 (mining: add cooldown to createNewBlock() immediately after IBD) mining.capnp: - Mining.createNewBlock: added context @3, cooldown @4 params - Mining.checkBlock: added context @2 param - Mining.createNewBlock: timeout now has default value (60000000) Also adds make_mining_old2_rejected integration test verifying that the server returns an error for the deprecated method.
As reported in #33994, connected mining clients will receive a flood of new templates if the node is still going through IBD or catching up on the last 24 hours. This PR fixes that using an optional cooldown mechanism, only applied to
createNewBlock().First, cooldown waits for IBD. Then, as the tip keeps moving forward, it waits a few seconds to see if the tip updated. If so, it restarts the timer and waits again. The trade-offs for this mechanism are explained below.
Because this PR changes
createNewBlock()from a method that returns quickly to one that can block for minutes, we rely on #34568 to fix a bug in our.capnpdefinition, adding the missingcontexttocreateNewBlock(andcheckBlock).The second commit then adds an
interrupt()method so that clients can cleanly disconnect.Rationale
The cooldown argument is optional, and not used by internal non-IPC code, for two reasons:
The reason it's only applied to
createNewBlock()is that this is the first method called by clients;waitNext()is a method on the interface returned bycreateNewBlock(), at which point the cooldown is done.After IBD, we wait N seconds if the header is N blocks ahead of the tip, with a minimum of 3 and a maximum of 20 seconds. The minimum waiting time is short enough that it shouldn't be annoying or confusing for someone manually starting up a client. While the maximum should be harmless if it happens spuriously (which it shouldn't).
If the minimum wait is too short, clients get a burst of templates, as observed in the original issue. We can't entirely rule this out without a lot of additional complexity (like scanning our own log file for heuristics). This PR should make it a lot less likely, and thanks to the IBD wait also limit it to one day worth of blocks (
-maxtipage).Some test runs on an M4 MacBook Pro, where I had a node catch up on the last few days worth of blocks:
As the chart shows, sometimes it takes longer than 3 seconds. But it turns out that in all those cases there were quite a few headers ahead of the tip. It also demonstrates that it's important to first wait for IBD, because it's less likely a random tip update takes longer than 20 seconds.