Skip to content

ci: remove Valgrind fuzz from CI matrix#34304

Closed
l0rinc wants to merge 1 commit intobitcoin:masterfrom
l0rinc:l0rinc/gha-valgrind-timeout
Closed

ci: remove Valgrind fuzz from CI matrix#34304
l0rinc wants to merge 1 commit intobitcoin:masterfrom
l0rinc:l0rinc/gha-valgrind-timeout

Conversation

@l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Jan 15, 2026

Problem definition

The Valgrind fuzz job was introduced in e4b0463.
Since then, ci/test/00_setup_env_native_fuzz_with_valgrind.sh seems to regularly time out on forked repositories (which fall back to GitHub-hosted runners), hitting the 240 minute job timeout.

This means that every PR I push on my own branch (I do that sometimes to make sure it passes CI before I push to bitcoin/bitcoin) will always fail.

Fix

Remove it from the CI matrix, similar to how the valgrind no-fuzz task is handled.
The setup script remains available for local testing.

@DrahtBot DrahtBot added the Tests label Jan 15, 2026
@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 15, 2026

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/34304.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept NACK fanquake
Concept ACK willcl-ark

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

@l0rinc l0rinc closed this Jan 15, 2026
@l0rinc l0rinc reopened this Jan 15, 2026
@willcl-ark
Copy link
Member

Concept ACK on this, thanks for addressing.

I think we could disable on non-cirrus runners with:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index daeefddcb60..173028ec4d1 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -518,7 +518,7 @@ jobs:
     name: ${{ matrix.name }}
     needs: runners
     runs-on: ${{ needs.runners.outputs.provider == 'cirrus' && matrix.cirrus-runner || matrix.fallback-runner }}
-    if: ${{ vars.SKIP_BRANCH_PUSH != 'true' || github.event_name == 'pull_request' }}
+    if: ${{ (vars.SKIP_BRANCH_PUSH != 'true' || github.event_name == 'pull_request') && (!matrix.cirrus-only || needs.runners.outputs.provider == 'cirrus') }}
     timeout-minutes: ${{ matrix.timeout-minutes }}
 
     env:
@@ -580,7 +580,7 @@ jobs:
 
           - name: 'Valgrind, fuzz'
             cirrus-runner: 'ghcr.io/cirruslabs/ubuntu-runner-amd64:24.04-md'
-            fallback-runner: 'ubuntu-24.04'
+            cirrus-only: true
             timeout-minutes: 240
             file-env: './ci/test/00_setup_env_native_fuzz_with_valgrind.sh'
 

This may though impact downstream forks who are using non-GHA-non-cirrus runners (if there are any), but IMO they can easily maintain a revertion of this patch, in combination (most likely) with an existing patch they maintain to use their custom runner types.

I don't think losing valgrind on forks is any concern to us though, as it will still run on all PRs and merges.

@l0rinc
Copy link
Contributor Author

l0rinc commented Jan 15, 2026

@willcl-ark, I think we can probably do both. This PR changes a timeout that's provably too low - if you push the above as a separate PR, I will gladly review it, I think it's orthogonal to this change.

cirrus-runner: 'ghcr.io/cirruslabs/ubuntu-runner-amd64:24.04-md'
fallback-runner: 'ubuntu-24.04'
timeout-minutes: 240
timeout-minutes: 360
Copy link
Member

@maflcko maflcko Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this. IIRC it was taking 3 hours, when the task was added, now 4 hours is not enough. More than 6 hours is not possible at all with GHA.

I am not aware of any issue that this task has found, which has not been found by the current msan task. Note that the msan task has library hardening enabled and recent msan versions also check memory for every function call (args and return values).

So I think it is probably easiest to remove this from the matrix. This is no different than the valgrind no-fuzz task, which is also not included in this matrix.

Of course, we should still keep the task config itself. Also, it should be run locally, but this is true with or without this pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's even simpler, pushed, updated the PR title & description and commit message, added you as coauthor.

@fanquake
Copy link
Member

fanquake commented Jan 15, 2026

~0. Not sure about having the test timeouts for this repository, determined by something outside this repository, where the resource availability is different (and undocumented).

now 4 hours is not enough.

It is though. Everything is fine in this repo, and someone running the tests somewhere else, without enough resources, isn't a good reason to bump timeouts and/or remove tests from this repository?

The `Valgrind fuzz` job was introduced in e4b0463.
Since then, ci/test/00_setup_env_native_fuzz_with_valgrind.sh seems to regularly time out on forked repositories (which fall back to GitHub-hosted runners), hitting the 240 minute job timeout.

Remove it from the CI matrix, similar to how the valgrind no-fuzz task is handled.
The setup script remains available for local testing.

Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^_@721217.xyz>
@l0rinc l0rinc force-pushed the l0rinc/gha-valgrind-timeout branch from a3d735a to be95745 Compare January 15, 2026 13:33
@l0rinc l0rinc changed the title ci: bump Valgrind fuzz timeout to pass with GHA ci: remove Valgrind fuzz from CI matrix Jan 15, 2026
@l0rinc
Copy link
Contributor Author

l0rinc commented Jan 15, 2026

Thanks for the reviews.

~0. Not sure about having the test timeouts for this repository, determined by something outside this repository, where the resource availability is different (and undocumented).

The repo should be forkable, it was working correctly before, adding this new task changed that.
But based on @maflcko's observations I have adjusted the PR.

@fanquake
Copy link
Member

NACK.

@l0rinc l0rinc closed this Jan 15, 2026
fanquake added a commit that referenced this pull request Feb 6, 2026
faa4ab1 ci: Drop valgrind fuzz from GHA matrix (MarcoFalke)

Pull request description:

  The valgrind fuzz task is problematic, because:

  * It is redundant with the msan fuzz task, which has std lib hardening enabled, so often UB is diagnosed before it even happens in the valgrind task.
  * All issues so far found by the valgrind fuzz task were also found by the hardened msan fuzz task.
  * All other issues were false-positives, which are hard to debug, and confusing and tedious to work around.

  I don't think there is any value in asking pull request authors to debug valgrind false-positives that they triggered by accident. So remove the task for now.

  I know that there are some devs, who like to keep the task, but if the task is kept, it should come with clear instructions on how to deal with false-postives in pull requests.

  I am not proposing to remove the config itself, and I am happy to continue maintaining it, like it was done before. However, as of now, running it in the GHA matrix is of negative or questionable benefit.

ACKs for top commit:
  l0rinc:
    ACK faa4ab1
  fanquake:
    ACK faa4ab1 - hopefully we can revisit re-adding soon. To be clear, I don't agree with the rationale from #34304, or the initial changes there. The case here, and the fact that it is causing disruption in this repo, is more pressing.

Tree-SHA512: 59272f4b5b01c3b8ee6078ea635441f17776d4d8923f1adacdabdbb00bd2eb0234b30dc5b27938e29f8e79b3c3bebed5f339ae36c2c8fb17ea9d3a2884bee986
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants