Skip to content

Conversation

@m3dwards
Copy link
Contributor

@m3dwards m3dwards commented May 29, 2024

PR for moving the ASAN + LSAN + USDT + friends job to github actions from Cirrus.

The motivation for this PR is that this task needs a full VM (or bare metal) to function, because of the tracepoints. It can not run in a container on an arbitrary Linux, because the outside machine must exactly match the specification of the distro used in the CI task config. This requires more maintenance for the persistent worker, and I think moving to GHA will reduce the maintenance burden, or at least make it possible for anyone to work on.

Also, it makes it easier to run the task on forks (bitcoin-inquisition, bitcoin-knots, devel forks, ...) without having to set-up a real machine.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 29, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK maflcko, hebasto, achow101
Concept ACK 0xB10C, kristapsk

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30164 ([PoC] ci: Add FreeBSD GitHub Actions job by hebasto)
  • #29274 (Support self-hosted Cirrus workers on forks by Sjors)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot added the Tests label May 29, 2024
@maflcko
Copy link
Member

maflcko commented May 29, 2024

Concept ACK, may review tomorrow if GHA is back online by then.

@hebasto
Copy link
Member

hebasto commented May 29, 2024

Concept ACK.

@m3dwards m3dwards force-pushed the move-asan-to-gha branch from ad401e0 to 9d26b3b Compare May 29, 2024 15:07
@m3dwards
Copy link
Contributor Author

Question: Do we want to keep this job whole or split parts out (such as just running the USDT tests)?

If we are keeping it whole as it is with ASAN and LSAN etc, I will investigate if we can get the IPV6 tests working. If I can't get them working, I would like to update the python IPV6 check as it's currently passing even though IPV6 isn't working on the runner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a single job chosen for a reson here? The free hosted GHA runners have 4 (v)CPUs available AFAIK...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been hitting issues on github actions runners cancelling the job after 4 minutes ,which is during compile, so this was an attempt to reduce the CPU load. It hasn't helped. See actions/runner-images#9848 (comment)

@0xB10C
Copy link
Contributor

0xB10C commented May 30, 2024

Concept ACK! Thanks for working on this!

@maflcko
Copy link
Member

maflcko commented May 31, 2024

Please remove the Cirrus task in the same commit, to make review easier.

@m3dwards
Copy link
Contributor Author

m3dwards commented Jun 3, 2024

@maflcko @hebasto @fanquake do we want to just extract the USDT part of the job into Github Actions?

@maflcko
Copy link
Member

maflcko commented Jun 3, 2024

Not sure. It seems more work and overhead to maintain two tasks, instead of one, no?

@maflcko
Copy link
Member

maflcko commented Jun 3, 2024

Well, if you really wanted to do it, you could move it into the "test each commit" task, and make that one run every time, to keep the number of tasks the same.

@m3dwards
Copy link
Contributor Author

m3dwards commented Jun 3, 2024

@maflcko happy to keep the job together as it is. Nothing prevents a split later if it's desired. I will look into the functional tests that require IPV6.

@m3dwards m3dwards force-pushed the move-asan-to-gha branch 2 times, most recently from d9326c8 to 3fd7bf0 Compare June 7, 2024 11:47
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, except for the feedback

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify in the name that this was set up for CI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it could split this up into a separate commit/pull, as this affects more than just this one task?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works, does it? installation happens when the image is built (docker build), not when the tests are compiled and run.

See the Cirrus CI comment:

   # In the image build step, no external environment variables are available,
    # so any settings will need to be written to the settings env file:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you are right. This was a late change after I had already had the USDT tests running and didn't check that they were still running 🤦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

@m3dwards m3dwards Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that these things were not relevant for these sanitisation tests but if they are I will add them back in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gui tests and benchmarks should also be running under the sanitizer, otherwise bugs may be missed in them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, only the fuzz binary compilation could be excluded, because there is a separate task for the fuzz tests under asan. But either seems fine.

@m3dwards m3dwards changed the title ci: move ASAN job to GitHub Actions from Cirrus CI ci: move ASan job to GitHub Actions from Cirrus CI Jun 7, 2024
@m3dwards m3dwards force-pushed the move-asan-to-gha branch from 3fd7bf0 to 7896174 Compare June 8, 2024 15:53
Allows IPV6 functional tests to run inside the container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about exposing this. This is an expert-only setting. Anyone changing it needs to understand exactly what they are doing, so they might as well modify the ci/test/00_setup_env_native_asan.sh config file manually.

I'd prefer to keep the sed approach as-is. If you want to change it, it may be better to do in a separate pull request. This way, the focus here is kept on only moving the task from one infra to the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I changed it as I didn't understand why it needed the sed, but after I did understand, I thought it was nicer for someone to be able to run the USDT tests outside of CI by changing an environment variable rather than changing the contents of a file.

Happy to change it back to the sed approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was nicer for someone to be able to run the USDT tests outside of CI by changing an environment variable

This is not enough. They will have to set up the exact system (kernel) outside the CI, as is used inside the CI. Either in a VM or on metal.

Also the new build arg may invalidate the layer cache of unrelated images.

Seems fine, if you feel strongly. Happy to ACK either approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong feelings, switched it back to sed approach.

Copy link
Member

@maflcko maflcko Jun 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a fix for a bug outside of GitHub Actions? If yes, it would be good to have exact steps to test locally. Maybe even in a separate pull request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are two issues that this fixes. One is with the RPC server and another is with -proxy. Both issues stem from the fact that the containers have IPV6 but only on the loopback interface (::1).

I have made a separate PR for the -proxy one which is #30245 (comment)

There is an existing (closed) issue for the bind problem which is #13155 but this is harder to fix as it's in libevent. I was planning on adding my comments to that closed issue and perhaps reopening it.

If both these issues were fixed then we could remove this IPV6 network and all the tests should continue passing, including the IPV6 ones.

.cirrus.yml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove the noble type completely in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Moving it from Cirrus CI so it can be easier to maintain and used by forks
@maflcko
Copy link
Member

maflcko commented Jun 12, 2024

review ACK 9eea51d

lgtm

@DrahtBot DrahtBot requested review from 0xB10C and hebasto June 12, 2024 15:30
@m3dwards m3dwards marked this pull request as ready for review June 12, 2024 15:37
@kristapsk
Copy link
Contributor

Concept ACK

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 9eea51d.

@achow101
Copy link
Member

ACK 9eea51d

@achow101 achow101 merged commit d97ddbe into bitcoin:master Jun 17, 2024
@m3dwards m3dwards deleted the move-asan-to-gha branch June 17, 2024 20:38
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found a nit

timeout-minutes: 120
env:
FILE_ENV: "./ci/test/00_setup_env_native_asan.sh"
INSTALL_BCC_TRACING_TOOLS: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can remove this.

Copy link
Contributor Author

@m3dwards m3dwards Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR to remove #30299

run: sed -i "s|\${INSTALL_BCC_TRACING_TOOLS}|true|g" ./ci/test/00_setup_env_native_asan.sh

- name: CI script
run: ./ci/test_run_all.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you link to a failing run? Just to double check that CI_FAILFAST_TEST_LEAVE_DANGLING works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing run: https://github.com/m3dwards/bitcoin/actions/runs/9354822480/job/25748533616

Looks ok to me, shows exit code 1.

For completeness, here is a run with a failing test with CI_FAILFAST_TEST_LEAVE_DANGLING removed: https://github.com/m3dwards/bitcoin/actions/runs/9562790625/job/26359909811. The job fails with exit code 137. The windows job behaves differently but I don't think that uses --failfast

fanquake added a commit that referenced this pull request Jun 18, 2024
518b06c ci: remove unused bcc variable from workflow (Max Edwards)

Pull request description:

  Fixes #30193 (comment)

ACKs for top commit:
  maflcko:
    lgtm ACK 518b06c

Tree-SHA512: c87364670e26e15176ee21372a2cc100db0c275a5cffb37cc33ec4c2d85d6067b593bd4a6dea37bf478d2af197786df9dfac3cfb76db023c8db37184bb104458
id: ccache-cache
uses: actions/cache/restore@v4
with:
path: ${{ env.CCACHE_DIR }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/bitcoin/bitcoin/actions/runs/9579271766/job/26411361263#step:7:12

Warning: Path Validation Error: Path(s) specified in the action for caching do(es) not exist, hence no cache is being saved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCache is currently written to cache docker volume which I assume on Cirrus CI gets shared between multiple jobs? As a github actions is more idempotent / ephemeral this isn't going to work as the ccache dir on the host is left empty at the end of the run.

I can see that this is the same strategy used for depends output and sources and previous_releases.

The solution could be to modify 02_run_container.sh script to have "GHA mode" where instead of using docker volumes we just bind mount to directories on the host that the github cache action can copy from / to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. I forgot about this. Your suggestion sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 19, 2024
Allows IPV6 functional tests to run inside the container

Github-Pull: bitcoin#30193
Rebased-From: 4b527fa
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 19, 2024
Moving it from Cirrus CI so it can be easier to maintain and used by forks

Github-Pull: bitcoin#30193
Rebased-From: 9eea51d
@fanquake fanquake mentioned this pull request Jun 19, 2024
@fanquake
Copy link
Member

Backported to 27.x in #30305.

fanquake added a commit that referenced this pull request Jun 24, 2024
b3093eb doc: Update rel notes for 27.x (fanquake)
6338f92 upnp: add compatibility for miniupnpc 2.2.8 (Cory Fields)
f34e446 ci: remove unused bcc variable from workflow (Max Edwards)
0d524b1 ci: move Asan / LSan / USDT job to Github Actions (Max Edwards)
43c40dd ci: add IPV6 network to ci container (Max Edwards)

Pull request description:

  Backports:
  * #30193
  * #30283
  * #30299

ACKs for top commit:
  willcl-ark:
    ACK b3093eb
  stickies-v:
    ACK b3093eb

Tree-SHA512: 325149f2b388072276e10fae2ebb7d8f3f5138d75f237c0182a09c631334fc2af9c2fe500db31bf41e94d4f154771e3cd386f8eb0d09d7a1ad656f637b71e735
fanquake added a commit that referenced this pull request Jun 24, 2024
da205dd ci: increase available ccache size to 300MB (Max Edwards)
4ecbbd9 ci: add option for running tests without volume (Max Edwards)

Pull request description:

  Fixes: #30193 (comment)

  Cache wasn't being saved when run on GHA because the default behaviour of the CI script was to store cache items in a docker volume. This works on Cirrus CI as the volumes are shared but it does not work on Github Actions in which each run is ephemeral.

  Kept the default behaviour the same so hopefully this continues to work for the Cirrus CI jobs.

ACKs for top commit:
  maflcko:
    utACK da205dd
  hebasto:
    ACK da205dd.

Tree-SHA512: 3b35482c0628adb60574a1462181ecfcb06cb237ed48beb6fe9aa51110be82f863dc9147e7f8d82960450aa6ecc3a24a70e3c8283fd24cdad075dbfb8fc93095
@hebasto
Copy link
Member

hebasto commented Jul 14, 2024

There is nothing to port to the CMake staging branch. Deleting the "Needs CMake port" label.

@bitcoin bitcoin locked and limited conversation to collaborators Jul 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants