ci: run some Bitcoin Core CI jobs #253
Conversation
|
@ryanofsky can you expand the whitelist with (that's what's causing the new workflows to get ignored, see annotations here: https://github.com/bitcoin-core/libmultiprocess/actions/runs/22963039475) |
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process. |
|
Thanks, the jobs run now, I'll debug them later. |
|
|
||
| - name: Run IPC and miner unit tests | ||
| run: | | ||
| ctest --test-dir build -R "ipc|miner_tests" --output-on-failure --timeout 480 |
There was a problem hiding this comment.
same. Also could use the longer --tests-regex instead of -R to be more self-explanatory?
| sudo apt-get install --no-install-recommends -y ${{ matrix.packages }} | ||
| sudo update-alternatives --install /usr/bin/clang++ clang++ "/usr/bin/clang++-${LLVM_VERSION}" 100 | ||
| sudo update-alternatives --install /usr/bin/clang clang "/usr/bin/clang-${LLVM_VERSION}" 100 | ||
| sudo update-alternatives --install /usr/bin/llvm-symbolizer llvm-symbolizer "/usr/bin/llvm-symbolizer-${LLVM_VERSION}" 100 |
There was a problem hiding this comment.
I wonder if we can split the upstream install script into smaller re-useable bash or python scripts, to be used here?
There was a problem hiding this comment.
I'm not sure about trying to expose bash scripts as a sort of stable CI API, which would essentially give us an undocumented and untested consumer. Inevitably, changes in our repo will break things, and we'll have another loop of having to re-refactor/change, based on requirements here, that nobody in our repository is accounting for when making changes there.
There was a problem hiding this comment.
In general we'll need to wait and see if these Bitcoin Core jobs are maintainable here over the long run. If it becomes more hassle that just opening a draft PR on bitcoin/bitcoin then we might just drop them.
a1e0fb1 to
cb8a9e8
Compare
|
All green. Pushed again to wire up ccache. |
Well, then this is obviously wrong, because it fails to find any of the countless bugs that still exist. It would be good to find at least one of them. Otherwise, this seems pointless? |
I think it makes more sense to try to find the bugs without the new tests. The goal here isn't to catch bugs in newly added deterministic tests, but to catch intermittent bugs in the vanilla (pre-existing) subtree. |
|
I added a third commit for tweaking the frequencies. All three jobs now run the unit and functional tests 200 times. |
|
Test run times:
Unit tests are still running, from previous experience the functional tests took about 1 second each. I'd like to keep the job runtimes under 15 minutes:
Assuming 3 minutes for setup, 4 minutes for unit tests and 4 minutes for functional tests, I'll tweak the numbers. (also forgot to install ccache on macOS) |
|
nice. Let's hope one of them fails 😅 |
|
ok, none failed :( |
|
I'm lowering the unit tests a bit. Also noticed I forgot to run the |
| run: | | ||
| for _ in $(seq 1 "${TSAN_FUNCTIONAL_TEST_RUNS}"); do | ||
| LD_LIBRARY_PATH="depends/${{ steps.host.outputs.host }}/lib" \ | ||
| build/test/functional/test_runner.py --filter='^interface_ipc' --jobs "${PARALLEL}" --timeout-factor=10 |
There was a problem hiding this comment.
there are only two tests, so the --jobs PARALLEL won't make any difference, also this is running sequentially over TSAN_FUNCTIONAL_TEST_RUNS
Maybe the GHA machines just won't trigger the failures here?
Has anyone reproduced any of the failures locally, when running the tests?
There was a problem hiding this comment.
Yeah, I got confused there. You can't use --filter to run tests multiple times, you have to pass the names in multiple times. The latest push should fix this.
(well, the next push...)
|
I extracted the test runners into a bash script. Narrowed the unit tests down to just miner_tests and ipc_tests. Still no failure, so I cranked up the number of functional test runs and increased parallelism. |
f92743d to
9e7f825
Compare
|
macOS hit a failure: Though maybe not a very interesting one. More likely to be brittleness in the test itself under high CPU pressure: self.log.debug("Wait for another, get one after increase in fees in the mempool")
template6 = await wait_and_do(
mining_wait_next_template(template5, stack, ctx, waitoptions),
lambda: self.miniwallet.send_self_transfer(fee_rate=10, from_node=self.nodes[0]))
assert template6 is not None |
|
So far it doesn't seem like we've reproduced the issues from #250. I slightly lowered the parallel pressure on macOS and slightly the reduced the number of repetitions on the sanitizers to keep them around 10 minutes. I split the test script between unit and functional tests so can track their time consumption separately. I also adjusted the functional test runner to abort and print the full log after failure. |
|
Ah wonderfull, TSan found a Log |
Does the test timeout not scale with --timeout-factor? |
We should probably make |
|
Unfortunately the TSan job doesn't reliably catch the known regression. I'd rather not make the job slower. Since it did catch it once, it would make sense to have longer runs on master. Let's add those in a followup. |
The IPC mining tests (interface_ipc_mining.py) currently use hardcoded timeouts (e.g., 1000ms, 60000ms) for operations like waitTipChanged and waiting for block templates. In heavily loaded CI environments, such as those running sanitizers with high parallelism, these hardcoded timeouts can be too short, leading to spurious test failures and brittleness. This commit multiplies these timeout variables by the test suite's global `self.options.timeout_factor`. This ensures that the IPC wait conditions scale appropriately when the test suite is run with a higher timeout factor, making the tests robust against slow execution environments. Addresses CI brittleness observed in bitcoin-core/libmultiprocess#253.
The IPC mining tests (interface_ipc_mining.py) currently use hardcoded timeouts (e.g., 1000ms, 60000ms) for operations like waitTipChanged and waiting for block templates. In heavily loaded CI environments, such as those running sanitizers with high parallelism, these hardcoded timeouts can be too short, leading to spurious test failures and brittleness. This commit multiplies these timeout variables by the test suite's global `self.options.timeout_factor`. This ensures that the IPC wait conditions scale appropriately when the test suite is run with a higher timeout factor, making the tests robust against slow execution environments. Addresses CI brittleness observed in bitcoin-core/libmultiprocess#253.
|
I've been running TSan on an AMD Ryzen 9 7950X with 16 cores and 64 GB RAM. It's not that easy to hit any of the failures #249 fixes. Sometimes it finds one after a few minutes, other times it finds nothing for half an hour. So I'm not surprised CI on this PR doesn't consistently hit them. For finding |
|
If you feel fancy, you can try tsan-23 with the new adaptive delay feature (https://clang.llvm.org/docs/ThreadSanitizer.html#enabling-adaptive-delay). Though, tsan-23 isn't released yet, so up to you, if you want to compile it here in CI or locally. |
Slightly more elaborate version of #252 with ccache and depends caching.