zmq test: fix sync-up by matching notification to generated block #21310

theStack · 2021-02-27T18:45:56Z

This is a follow-up PR for #21008, fixes #21216.

In the course of investigating the problem with jnewbery (analyzing the Cirrus log https://cirrus-ci.com/task/4660108304056320), it turned out that the "sync up" procedure of repeatedly generating a block and waiting for a notification with timeout is too brittle in its current form, as the following scenario could happen:

generate block A
receive notification, timeout happens => repeat procedure
generate block B
node publishes block A notification
receive notification, we receive the one caused by block A (!!!) => sync-up procedure is completed
node publishes block B notification
the actual test starts
on the first notification reception, the one caused by block B is received, rather than the one actually caused by test code => assertion failure

This change in the PR ensures that after each test block generation, we wait for the notification that is actually caused by that block and ignore others from possibly earlier blocks. The matching is kind of ugly, it assumes that one out of four components in the block is contained in the notification: the block hash, the tx id, the raw block data or the raw transaction data. (Unfortunately we have to support all publisher topics.)

I'm aware that this is quite a lot of code now only for establishing a robust test setup. OTOH I wouldn't know of a better method right now, suggestions are very welcome.

Note for potential reviewers: for both reproducing the issue on master branch and verifying on PR branch, one can simply generate two blocks in the sync-up procedure rather than one.

jonatack

Approach ACK, thanks for working on this.

some suggestions

-# Helper class for setting up a ZMQ test via the "sync up" procedure.
-# Generates a block on the specified node on instantiation and provides a method to check
-# whether a ZMQ notification matches, i.e. the event was caused by this generated block.
-# Assumes that a notification either contains the generated block's hash, it's
-# (coinbase) transaction id, the raw block or raw transaction data.
 class ZMQTestSetupBlock:
+    """Helper class for setting up a ZMQ test via the "sync up" procedure.
+    Generates a block on the specified node on instantiation and provides a
+    method to check whether a ZMQ notification matches, i.e. the event was
+    caused by this generated block. Assumes that a notification either contains
+    the generated block's hash, it's (coinbase) transaction id, the raw block or
+    raw transaction data.
+    """
+
     def __init__(self, node):
         self.block_hash = node.generate(1)[0]
         coinbase = node.getblock(self.block_hash, 2)['tx'][0]
@@ -76,10 +79,12 @@ class ZMQTestSetupBlock:
         self.raw_block = node.getblock(self.block_hash, 0)
 
     def caused_notification(self, notification):
-        return self.block_hash in notification or \
-               self.tx_hash in notification or \
-               self.raw_block in notification or \
-               self.raw_tx in notification
+        return (
+            self.block_hash in notification
+            or self.tx_hash in notification
+            or self.raw_block in notification
+            or self.raw_tx in notification
+        )
 
 
 class ZMQTest (BitcoinTestFramework):
@@ -136,7 +141,7 @@ class ZMQTest (BitcoinTestFramework):
             for sub in subscribers:
                 try:
                     while not test_block.caused_notification(sub.receive().hex()):
-                        self.log.debug("Ignoring sync-up notification for previoulsy generated block.")
+                        self.log.debug("Ignoring sync-up notification for previously generated block.")
                 except zmq.error.Again:
                     self.log.debug("Didn't receive sync-up notification, trying again.")
                     recv_failed = True
@@ -361,7 +366,7 @@ class ZMQTest (BitcoinTestFramework):
             block_count = self.nodes[0].getblockcount()
             best_hash = self.nodes[0].getbestblockhash()
             self.nodes[0].invalidateblock(best_hash)
-            sleep(2) # Bit of room to make sure transaction things happened
+            sleep(2)  # Bit of room to make sure transaction things happened
 
             # Make sure getrawmempool mempool_sequence results aren't "queued" but immediately reflective
             # of the time they were gathered.
@@ -410,8 +415,8 @@ class ZMQTest (BitcoinTestFramework):
             assert_equal(label, "A")
             # More transactions to be simply mined
             for i in range(len(more_tx)):
-                    assert_equal((more_tx[i], "A", mempool_seq), seq.receive_sequence())
-                    mempool_seq += 1
+                assert_equal((more_tx[i], "A", mempool_seq), seq.receive_sequence())
+                mempool_seq += 1
             # Bumped by rbf
             assert_equal((orig_txid, "R", mempool_seq), seq.receive_sequence())
             mempool_seq += 1
@@ -426,7 +431,7 @@ class ZMQTest (BitcoinTestFramework):
             assert_equal((orig_txid_2, "A", mempool_seq), seq.receive_sequence())
             mempool_seq += 1
             self.nodes[0].generatetoaddress(1, ADDRESS_BCRT1_UNSPENDABLE)
-            self.sync_all() # want to make sure we didn't break "consensus" for other tests
+            self.sync_all()  # want to make sure we didn't break "consensus" for other tests
 
     def test_mempool_sync(self):

It turned out that the "sync up" procedure of repeatedly generating a block and waiting for a notification once with timeout is too naive in its current form, as the following scenario could happen: - generate block A - receive notification, timeout happens -> repeat procedure - generate block B - node publishes block A notification - receive notification, we receive the one caused by block A -> sync-up procedure is completed - node publishes block B - the actual test starts - on the first notification reception, one caused by block B is received, rather than the one actually caused by test code, leading to failure This change ensures that after each test block generation, we wait for the notification that is actually caused by that block and ignore others from possibly earlier blocks. Co-authored-by: Jon Atack <jon@atack.com>

theStack · 2021-03-01T00:41:57Z

Force-pushed with all changes suggested by @jonatack (thanks for catching the typo and the style fixups, looks much better now! 🚀 ).

maflcko · 2021-03-02T10:30:03Z

Concept ACK 8a8c638

Thanks!

…o gener…

DrahtBot added the Tests label Feb 27, 2021

theStack force-pushed the 2021-zmq-fix_sync_up_procedure branch from fc1bcab to b387970 Compare February 27, 2021 20:01

jonatack reviewed Feb 27, 2021

View reviewed changes

theStack force-pushed the 2021-zmq-fix_sync_up_procedure branch from b387970 to 8a8c638 Compare March 1, 2021 00:34

maflcko merged commit cfce346 into bitcoin:master Mar 2, 2021

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Mar 2, 2021

Merge bitcoin#21310: zmq test: fix sync-up by matching notification t…

4214c9c

…o gener…

adamjonas mentioned this pull request Mar 2, 2021

Fix zmq test flakiness #20934

Closed

bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

zmq test: fix sync-up by matching notification to generated block #21310

zmq test: fix sync-up by matching notification to generated block #21310

Uh oh!

theStack commented Feb 27, 2021 •

edited

Loading

Uh oh!

jonatack left a comment

Uh oh!

theStack commented Mar 1, 2021

Uh oh!

maflcko commented Mar 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zmq test: fix sync-up by matching notification to generated block #21310

zmq test: fix sync-up by matching notification to generated block #21310

Uh oh!

Conversation

theStack commented Feb 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatack left a comment

Choose a reason for hiding this comment

Uh oh!

theStack commented Mar 1, 2021

Uh oh!

maflcko commented Mar 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

theStack commented Feb 27, 2021 •

edited

Loading