ci: remove Valgrind fuzz from CI matrix#34304
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34304. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste |
|
Concept ACK on this, thanks for addressing. I think we could disable on non-cirrus runners with: diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index daeefddcb60..173028ec4d1 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -518,7 +518,7 @@ jobs:
name: ${{ matrix.name }}
needs: runners
runs-on: ${{ needs.runners.outputs.provider == 'cirrus' && matrix.cirrus-runner || matrix.fallback-runner }}
- if: ${{ vars.SKIP_BRANCH_PUSH != 'true' || github.event_name == 'pull_request' }}
+ if: ${{ (vars.SKIP_BRANCH_PUSH != 'true' || github.event_name == 'pull_request') && (!matrix.cirrus-only || needs.runners.outputs.provider == 'cirrus') }}
timeout-minutes: ${{ matrix.timeout-minutes }}
env:
@@ -580,7 +580,7 @@ jobs:
- name: 'Valgrind, fuzz'
cirrus-runner: 'ghcr.io/cirruslabs/ubuntu-runner-amd64:24.04-md'
- fallback-runner: 'ubuntu-24.04'
+ cirrus-only: true
timeout-minutes: 240
file-env: './ci/test/00_setup_env_native_fuzz_with_valgrind.sh'
This may though impact downstream forks who are using non-GHA-non-cirrus runners (if there are any), but IMO they can easily maintain a revertion of this patch, in combination (most likely) with an existing patch they maintain to use their custom runner types. I don't think losing valgrind on forks is any concern to us though, as it will still run on all PRs and merges. |
|
@willcl-ark, I think we can probably do both. This PR changes a timeout that's provably too low - if you push the above as a separate PR, I will gladly review it, I think it's orthogonal to this change. |
.github/workflows/ci.yml
Outdated
| cirrus-runner: 'ghcr.io/cirruslabs/ubuntu-runner-amd64:24.04-md' | ||
| fallback-runner: 'ubuntu-24.04' | ||
| timeout-minutes: 240 | ||
| timeout-minutes: 360 |
There was a problem hiding this comment.
not sure about this. IIRC it was taking 3 hours, when the task was added, now 4 hours is not enough. More than 6 hours is not possible at all with GHA.
I am not aware of any issue that this task has found, which has not been found by the current msan task. Note that the msan task has library hardening enabled and recent msan versions also check memory for every function call (args and return values).
So I think it is probably easiest to remove this from the matrix. This is no different than the valgrind no-fuzz task, which is also not included in this matrix.
Of course, we should still keep the task config itself. Also, it should be run locally, but this is true with or without this pull request.
There was a problem hiding this comment.
Thanks, that's even simpler, pushed, updated the PR title & description and commit message, added you as coauthor.
|
~0. Not sure about having the test timeouts for this repository, determined by something outside this repository, where the resource availability is different (and undocumented).
It is though. Everything is fine in this repo, and someone running the tests somewhere else, without enough resources, isn't a good reason to bump timeouts and/or remove tests from this repository? |
The `Valgrind fuzz` job was introduced in e4b0463. Since then, ci/test/00_setup_env_native_fuzz_with_valgrind.sh seems to regularly time out on forked repositories (which fall back to GitHub-hosted runners), hitting the 240 minute job timeout. Remove it from the CI matrix, similar to how the valgrind no-fuzz task is handled. The setup script remains available for local testing. Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^_@721217.xyz>
a3d735a to
be95745
Compare
Valgrind fuzz timeout to pass with GHAValgrind fuzz from CI matrix
|
Thanks for the reviews.
The repo should be forkable, it was working correctly before, adding this new task changed that. |
|
NACK. |
faa4ab1 ci: Drop valgrind fuzz from GHA matrix (MarcoFalke) Pull request description: The valgrind fuzz task is problematic, because: * It is redundant with the msan fuzz task, which has std lib hardening enabled, so often UB is diagnosed before it even happens in the valgrind task. * All issues so far found by the valgrind fuzz task were also found by the hardened msan fuzz task. * All other issues were false-positives, which are hard to debug, and confusing and tedious to work around. I don't think there is any value in asking pull request authors to debug valgrind false-positives that they triggered by accident. So remove the task for now. I know that there are some devs, who like to keep the task, but if the task is kept, it should come with clear instructions on how to deal with false-postives in pull requests. I am not proposing to remove the config itself, and I am happy to continue maintaining it, like it was done before. However, as of now, running it in the GHA matrix is of negative or questionable benefit. ACKs for top commit: l0rinc: ACK faa4ab1 fanquake: ACK faa4ab1 - hopefully we can revisit re-adding soon. To be clear, I don't agree with the rationale from #34304, or the initial changes there. The case here, and the fact that it is causing disruption in this repo, is more pressing. Tree-SHA512: 59272f4b5b01c3b8ee6078ea635441f17776d4d8923f1adacdabdbb00bd2eb0234b30dc5b27938e29f8e79b3c3bebed5f339ae36c2c8fb17ea9d3a2884bee986
Problem definition
The
Valgrind fuzzjob was introduced in e4b0463.Since then,
ci/test/00_setup_env_native_fuzz_with_valgrind.shseems to regularly time out on forked repositories (which fall back to GitHub-hosted runners), hitting the 240 minute job timeout.This means that every PR I push on my own branch (I do that sometimes to make sure it passes CI before I push to
bitcoin/bitcoin) will always fail.Fix
Remove it from the CI matrix, similar to how the valgrind no-fuzz task is handled.
The setup script remains available for local testing.