-
Notifications
You must be signed in to change notification settings - Fork 38.7k
ci: move ASan job to GitHub Actions from Cirrus CI #30193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
Concept ACK, may review tomorrow if GHA is back online by then. |
|
Concept ACK. |
ad401e0 to
9d26b3b
Compare
|
Question: Do we want to keep this job whole or split parts out (such as just running the USDT tests)? If we are keeping it whole as it is with ASAN and LSAN etc, I will investigate if we can get the IPV6 tests working. If I can't get them working, I would like to update the python IPV6 check as it's currently passing even though IPV6 isn't working on the runner. |
.github/workflows/ci.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a single job chosen for a reson here? The free hosted GHA runners have 4 (v)CPUs available AFAIK...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been hitting issues on github actions runners cancelling the job after 4 minutes ,which is during compile, so this was an attempt to reduce the CPU load. It hasn't helped. See actions/runner-images#9848 (comment)
|
Concept ACK! Thanks for working on this! |
|
Please remove the Cirrus task in the same commit, to make review easier. |
|
Not sure. It seems more work and overhead to maintain two tasks, instead of one, no? |
|
Well, if you really wanted to do it, you could move it into the "test each commit" task, and make that one run every time, to keep the number of tasks the same. |
|
@maflcko happy to keep the job together as it is. Nothing prevents a split later if it's desired. I will look into the functional tests that require IPV6. |
d9326c8 to
3fd7bf0
Compare
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, except for the feedback
ci/test/02_run_container.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify in the name that this was set up for CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it could split this up into a separate commit/pull, as this affects more than just this one task?
ci/test/02_run_container.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works, does it? installation happens when the image is built (docker build), not when the tests are compiled and run.
See the Cirrus CI comment:
# In the image build step, no external environment variables are available,
# so any settings will need to be written to the settings env file:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you are right. This was a late change after I had already had the USDT tests running and didn't check that they were still running 🤦
ci/test/00_setup_env_native_asan.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that these things were not relevant for these sanitisation tests but if they are I will add them back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gui tests and benchmarks should also be running under the sanitizer, otherwise bugs may be missed in them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, only the fuzz binary compilation could be excluded, because there is a separate task for the fuzz tests under asan. But either seems fine.
3fd7bf0 to
7896174
Compare
Allows IPV6 functional tests to run inside the container
7896174 to
b4b4f27
Compare
ci/test/02_run_container.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about exposing this. This is an expert-only setting. Anyone changing it needs to understand exactly what they are doing, so they might as well modify the ci/test/00_setup_env_native_asan.sh config file manually.
I'd prefer to keep the sed approach as-is. If you want to change it, it may be better to do in a separate pull request. This way, the focus here is kept on only moving the task from one infra to the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I changed it as I didn't understand why it needed the sed, but after I did understand, I thought it was nicer for someone to be able to run the USDT tests outside of CI by changing an environment variable rather than changing the contents of a file.
Happy to change it back to the sed approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was nicer for someone to be able to run the USDT tests outside of CI by changing an environment variable
This is not enough. They will have to set up the exact system (kernel) outside the CI, as is used inside the CI. Either in a VM or on metal.
Also the new build arg may invalidate the layer cache of unrelated images.
Seems fine, if you feel strongly. Happy to ACK either approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong feelings, switched it back to sed approach.
ci/test/02_run_container.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a fix for a bug outside of GitHub Actions? If yes, it would be good to have exact steps to test locally. Maybe even in a separate pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are two issues that this fixes. One is with the RPC server and another is with -proxy. Both issues stem from the fact that the containers have IPV6 but only on the loopback interface (::1).
I have made a separate PR for the -proxy one which is #30245 (comment)
There is an existing (closed) issue for the bind problem which is #13155 but this is harder to fix as it's in libevent. I was planning on adding my comments to that closed issue and perhaps reopening it.
If both these issues were fixed then we could remove this IPV6 network and all the tests should continue passing, including the IPV6 ones.
.cirrus.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should remove the noble type completely in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Moving it from Cirrus CI so it can be easier to maintain and used by forks
b4b4f27 to
9eea51d
Compare
|
review ACK 9eea51d lgtm |
|
Concept ACK |
hebasto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 9eea51d.
|
ACK 9eea51d |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found a nit
| timeout-minutes: 120 | ||
| env: | ||
| FILE_ENV: "./ci/test/00_setup_env_native_asan.sh" | ||
| INSTALL_BCC_TRACING_TOOLS: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR to remove #30299
| run: sed -i "s|\${INSTALL_BCC_TRACING_TOOLS}|true|g" ./ci/test/00_setup_env_native_asan.sh | ||
|
|
||
| - name: CI script | ||
| run: ./ci/test_run_all.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you link to a failing run? Just to double check that CI_FAILFAST_TEST_LEAVE_DANGLING works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing run: https://github.com/m3dwards/bitcoin/actions/runs/9354822480/job/25748533616
Looks ok to me, shows exit code 1.
For completeness, here is a run with a failing test with CI_FAILFAST_TEST_LEAVE_DANGLING removed: https://github.com/m3dwards/bitcoin/actions/runs/9562790625/job/26359909811. The job fails with exit code 137. The windows job behaves differently but I don't think that uses --failfast
518b06c ci: remove unused bcc variable from workflow (Max Edwards) Pull request description: Fixes #30193 (comment) ACKs for top commit: maflcko: lgtm ACK 518b06c Tree-SHA512: c87364670e26e15176ee21372a2cc100db0c275a5cffb37cc33ec4c2d85d6067b593bd4a6dea37bf478d2af197786df9dfac3cfb76db023c8db37184bb104458
| id: ccache-cache | ||
| uses: actions/cache/restore@v4 | ||
| with: | ||
| path: ${{ env.CCACHE_DIR }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/bitcoin/bitcoin/actions/runs/9579271766/job/26411361263#step:7:12
Warning: Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CCache is currently written to cache docker volume which I assume on Cirrus CI gets shared between multiple jobs? As a github actions is more idempotent / ephemeral this isn't going to work as the ccache dir on the host is left empty at the end of the run.
I can see that this is the same strategy used for depends output and sources and previous_releases.
The solution could be to modify 02_run_container.sh script to have "GHA mode" where instead of using docker volumes we just bind mount to directories on the host that the github cache action can copy from / to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right. I forgot about this. Your suggestion sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allows IPV6 functional tests to run inside the container Github-Pull: bitcoin#30193 Rebased-From: 4b527fa
Moving it from Cirrus CI so it can be easier to maintain and used by forks Github-Pull: bitcoin#30193 Rebased-From: 9eea51d
|
Backported to 27.x in #30305. |
b3093eb doc: Update rel notes for 27.x (fanquake) 6338f92 upnp: add compatibility for miniupnpc 2.2.8 (Cory Fields) f34e446 ci: remove unused bcc variable from workflow (Max Edwards) 0d524b1 ci: move Asan / LSan / USDT job to Github Actions (Max Edwards) 43c40dd ci: add IPV6 network to ci container (Max Edwards) Pull request description: Backports: * #30193 * #30283 * #30299 ACKs for top commit: willcl-ark: ACK b3093eb stickies-v: ACK b3093eb Tree-SHA512: 325149f2b388072276e10fae2ebb7d8f3f5138d75f237c0182a09c631334fc2af9c2fe500db31bf41e94d4f154771e3cd386f8eb0d09d7a1ad656f637b71e735
da205dd ci: increase available ccache size to 300MB (Max Edwards) 4ecbbd9 ci: add option for running tests without volume (Max Edwards) Pull request description: Fixes: #30193 (comment) Cache wasn't being saved when run on GHA because the default behaviour of the CI script was to store cache items in a docker volume. This works on Cirrus CI as the volumes are shared but it does not work on Github Actions in which each run is ephemeral. Kept the default behaviour the same so hopefully this continues to work for the Cirrus CI jobs. ACKs for top commit: maflcko: utACK da205dd hebasto: ACK da205dd. Tree-SHA512: 3b35482c0628adb60574a1462181ecfcb06cb237ed48beb6fe9aa51110be82f863dc9147e7f8d82960450aa6ecc3a24a70e3c8283fd24cdad075dbfb8fc93095
|
There is nothing to port to the CMake staging branch. Deleting the "Needs CMake port" label. |
PR for moving the ASAN + LSAN + USDT + friends job to github actions from Cirrus.
The motivation for this PR is that this task needs a full VM (or bare metal) to function, because of the tracepoints. It can not run in a container on an arbitrary Linux, because the outside machine must exactly match the specification of the distro used in the CI task config. This requires more maintenance for the persistent worker, and I think moving to GHA will reduce the maintenance burden, or at least make it possible for anyone to work on.
Also, it makes it easier to run the task on forks (bitcoin-inquisition, bitcoin-knots, devel forks, ...) without having to set-up a real machine.