Skip to content

Conversation

@maflcko
Copy link
Member

@maflcko maflcko commented Mar 7, 2023

valgrind will incur a slowdown of at least 2, so increase the default timeout factor.

This should reduce the number of reported issues. See also #27112 (comment)

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 7, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK fanquake
Concept ACK Sjors

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@DrahtBot DrahtBot added the Tests label Mar 7, 2023
@fanquake
Copy link
Member

fanquake commented Mar 7, 2023

Concept ACK - will test

@fanquake
Copy link
Member

fanquake commented Mar 8, 2023

I'm still seeing #27208 with this change (fa27cf4). See here for the full combined log:

...............................................................................................
162/256 - p2p_ibd_stalling.py failed, Duration: 126 s

stdout:
2023-03-08T04:33:24.745000Z TestFramework (INFO): PRNG seed is: 3703385557670057473
2023-03-08T04:33:24.745000Z TestFramework (INFO): Initializing test directory /home/ubuntu/ci_scratch/ci/scratch/test_runner/test_runner_₿_🏃_20230308_012459/p2p_ibd_stalling_97
2023-03-08T04:33:38.260000Z TestFramework (INFO): Prepare blocks without sending them to the node
2023-03-08T04:33:38.356000Z TestFramework (INFO): Check that a staller does not get disconnected if the 1024 block lookahead buffer is filled
2023-03-08T04:35:17.908000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
        test_function = lambda: self.is_connected
'''
2023-03-08T04:35:17.908000Z TestFramework (ERROR): Assertion failed
Traceback (most recent call last):
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 132, in main
    self.run_test()
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/p2p_ibd_stalling.py", line 77, in run_test
    peers.append(node.add_outbound_p2p_connection(P2PStaller(stall_block), p2p_idx=id, connection_type="outbound-full-relay"))
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_node.py", line 663, in add_outbound_p2p_connection
    p2p_conn.wait_for_connect()
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 467, in wait_for_connect
    wait_until_helper(test_function, timeout=timeout, lock=p2p_lock)
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 281, in wait_until_helper
    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
AssertionError: Predicate ''''
        test_function = lambda: self.is_connected
''' not true after 60.0 seconds
2023-03-08T04:35:27.925000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
        wait_until_helper(lambda: not self.network_event_loop.is_running(), timeout=timeout)
'''
[node 0] Cleaning up leftover process

stderr:
Traceback (most recent call last):
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/p2p_ibd_stalling.py", line 164, in <module>
    P2PIBDStallingTest().main()
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 155, in main
    exit_code = self.shutdown()
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 310, in shutdown
    self.network_thread.close()
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/p2p.py", line 603, in close
    wait_until_helper(lambda: not self.network_event_loop.is_running(), timeout=timeout)
  File "/home/ubuntu/ci_scratch/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 281, in wait_until_helper
    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
AssertionError: Predicate ''''
        wait_until_helper(lambda: not self.network_event_loop.is_running(), timeout=timeout)
''' not true after 10.0 seconds
Task was destroyed but it is pending!
task: <Task pending name='Task-9' coro=<BaseSelectorEventLoop._accept_connection2() running at /usr/lib/python3.10/asyncio/selector_events.py:193>>
/usr/lib/python3.10/asyncio/base_events.py:671: RuntimeWarning: coroutine 'BaseSelectorEventLoop._accept_connection2' was never awaited
......
p2p_ibd_stalling.py                                    | ✖ Failed  | 126 s

ALL                                                    | ✖ Failed  | 42880 s (accumulated) 
Runtime: 11402 s

@maflcko
Copy link
Member Author

maflcko commented Mar 8, 2023

Yeah, this doesn't affect the valgrind CI system. Only test runs with --valgrind passed (and no timeout factor).

@fanquake
Copy link
Member

fanquake commented Mar 8, 2023

Yeah, this doesn't affect the valgrind CI system.

Ok. Can we consolidate this at all? The disparity is somewhat confusing because you would assume that the Valgrind CI job would use the dedicated --valgrind option provided by the test framework.

@maflcko
Copy link
Member Author

maflcko commented Mar 8, 2023

Not sure if that makes sense. This would require special casing the non-test binaries bitcoind/-cli/-tx/-util etc in the valgrind wrapper. Also, if someone wants to interactively re-run a single test inside the CI pod, they'd have to pass --valgrind. Maybe open a new discussion thread or pull request?

@Sjors
Copy link
Member

Sjors commented Mar 8, 2023

Concept ACK. On my test machine --valgrind --timeout-factor=4 works, whereas without --timeout-factor it doesn't. Haven't tried a lower value.

I also have to set --jobs very low (e.g. 5 on a 32 thread CPU, well below the point where it uses ~100% CPU), or I still get timeouts and other errors (e.g. no RPC connection in p2p_tx_download.py). So perhaps 4 is not enough if you want to max-out a machine, but fine for CI. (Also, I compiled without BDB)

Even with that some tests fail, but they do so very quickly so that seems unrelated (see #27228).

Copy link
Member

@fanquake fanquake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK fa27cf4 - I still see at least one actual issue when running the functional tests under --valgrind (outside the CI system), but will follow up separately with that. Increasing the timeout here seems fine.

@fanquake fanquake merged commit f088949 into bitcoin:master Mar 13, 2023
@maflcko maflcko deleted the 2303-test-valgrind-avoid-timeout-🎛 branch March 13, 2023 15:25
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Mar 13, 2023
fa27cf4 test: Default timeout factor to 4 under --valgrind (MarcoFalke)

Pull request description:

  valgrind will incur a slowdown of at least 2, so increase the default timeout factor.

  This should reduce the number of reported issues. See also bitcoin#27112 (comment)

ACKs for top commit:
  fanquake:
    ACK fa27cf4 - I still see at least one actual issue when running the functional tests under `--valgrind` (outside the CI system), but will follow up separately with that. Increasing the timeout here seems fine.

Tree-SHA512: 4467559a3bfd98f5735f300f6ed54c68f951191d65a2b1294d71d72cc5d0864964de562d5dfa0a4855fc541ccb269a538b7aeb3d408d2d012a5369513397c395
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Mar 15, 2023
fa27cf4 test: Default timeout factor to 4 under --valgrind (MarcoFalke)

Pull request description:

  valgrind will incur a slowdown of at least 2, so increase the default timeout factor.

  This should reduce the number of reported issues. See also bitcoin#27112 (comment)

ACKs for top commit:
  fanquake:
    ACK fa27cf4 - I still see at least one actual issue when running the functional tests under `--valgrind` (outside the CI system), but will follow up separately with that. Increasing the timeout here seems fine.

Tree-SHA512: 4467559a3bfd98f5735f300f6ed54c68f951191d65a2b1294d71d72cc5d0864964de562d5dfa0a4855fc541ccb269a538b7aeb3d408d2d012a5369513397c395
@bitcoin bitcoin locked and limited conversation to collaborators Mar 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants