ci: add workflow to run GPU tests on selfhosted runner by ariostas · Pull Request #3543 · scikit-hep/awkward

ariostas · 2025-06-13T13:57:35Z

This PR adds a workflow to run the GPU tests on the new CI node at Princeton. There are a couple of things to discuss:

I intentionally didn't add the new job to pass-tests.needs since I think we should first make sure that it is reliable enough that it's not going to end up blocking PR regularly.
For now, I added it as a job in the usual test.yml workflow. This means that it will run for all PRs and pushes to main. There are some security concerns with this, so we could consider moving it to a separate workflow that needs to be triggered by a workflow_dispatch trigger, which only people with write permissions can do. And we could also run nightly tests to make sure things are not breaking. But I think it might also be fine to leave it it with the normal workflow for the following reasons:
- For first-time contributors we need to approve running the tests.
- I have the runner running inside a container with very limited access to the filesystem. So even if they wipe everything it's fine. I think a bigger concern would be someone trying to mine bitcoin or use the cluster for a DDOS attack or something.
- Many of us have eyes on this repo, so I think if someone tries to do something we'll find out and stop it pretty quickly.

I don't have permissions to add a runner for this repo, so I'll need someone (probably @ianna) to help me with that. I tested it on my fork and it's working well. Here are the logs for a test run I did. There's actually one test that fails, so we should look into that.

After we settle all these things, I'll add a page in the wiki describing how to set up the selfhoster runner.

ikrommyd · 2025-06-13T14:01:41Z

you will probably be getting this: #3480 which needs fixing

ariostas · 2025-06-13T14:03:46Z

you will probably be getting this: #3480 which needs fixing

Oh yeah, that's the one. I didn't realize it was already reported

codecov · 2025-06-13T14:07:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.42%. Comparing base (b749e49) to head (fa1cd5e).
Report is 361 commits behind head on main.

Additional details and impacted files

see 195 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ikrommyd · 2025-06-13T14:40:28Z

you will probably be getting this: #3480 which needs fixing

Oh yeah, that's the one. I didn't realize it was already reported

Yeah I found this some time ago on my desktop. I rarely install cudf though.

ianna

@ariostas - Great job! I'll add the runner :-)

requirements-test-gpu.txt

ianna · 2025-06-16T17:47:25Z

@ariostas - I suggest we mark the cudf string test as failed to move on with the PR. Thanks!

tests-cuda/test_3051_to_cuda.py::test_null PASSED                        [ 24%]
Error: test_strings

TypeError: StringColumn.__init__() missing 2 required positional arguments: 'size' and 'dtype'

This error occurred while calling

    ak.to_cudf(
        <Array ['hey', 'hi', 'hum'] type='3 * string'>
    )

ariostas · 2025-06-17T07:27:34Z

I suggest we mark the cudf string test as failed to move on with the PR. Thanks!

Sounds good. I'll make a couple of final tweaks. I'm just trying to simplify the setup a bit.

ariostas · 2025-06-17T08:34:04Z

Okay, this is ready now

ianna · 2025-06-17T08:37:08Z

Okay, this is ready now

I'll enable auto-merge then. Thanks!

ariostas · 2025-06-17T08:44:34Z

@ianna would the wiki be a good place to write some documentation about how to set up the runner?

ianna · 2025-06-17T08:52:19Z

@ianna would the wiki be a good place to write some documentation about how to set up the runner?

if it's awkward-specific, I think a maintainer-guide directory in docs would be good.

Added job to run GPU tests on selfhosted runner

cb69fd1

ianna approved these changes Jun 13, 2025

View reviewed changes

requirements-test-gpu.txt Outdated Show resolved Hide resolved

ariostas added 2 commits June 13, 2025 11:45

Add numba-cuda requirement to GPU tests

56db7ac

Merge remote-tracking branch 'origin/main' into selfhosted_runner

cf1d576

ariostas force-pushed the selfhosted_runner branch from 190585d to cf1d576 Compare June 16, 2025 10:59

Try using a container that already has cuda

89cfb68

ariostas force-pushed the selfhosted_runner branch from 50443f5 to cf86ace Compare June 17, 2025 08:26

Mark failing test as xfail for now

fa1cd5e

ariostas force-pushed the selfhosted_runner branch from cf86ace to fa1cd5e Compare June 17, 2025 08:30

ariostas marked this pull request as ready for review June 17, 2025 08:33

ianna enabled auto-merge (squash) June 17, 2025 08:37

ianna merged commit f14d248 into scikit-hep:main Jun 17, 2025
44 checks passed

ariostas mentioned this pull request Jun 17, 2025

ci: run cuda kernel tests #3551

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add workflow to run GPU tests on selfhosted runner#3543

ci: add workflow to run GPU tests on selfhosted runner#3543
ianna merged 5 commits intoscikit-hep:mainfrom
ariostas:selfhosted_runner

ariostas commented Jun 13, 2025

Uh oh!

ikrommyd commented Jun 13, 2025

Uh oh!

ariostas commented Jun 13, 2025

Uh oh!

codecov bot commented Jun 13, 2025 •

edited

Loading

Uh oh!

ikrommyd commented Jun 13, 2025

Uh oh!

ianna left a comment

Uh oh!

Uh oh!

ianna commented Jun 16, 2025

Uh oh!

ariostas commented Jun 17, 2025

Uh oh!

ariostas commented Jun 17, 2025

Uh oh!

ianna commented Jun 17, 2025

Uh oh!

ariostas commented Jun 17, 2025

Uh oh!

Uh oh!

ianna commented Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ariostas commented Jun 13, 2025

Uh oh!

ikrommyd commented Jun 13, 2025

Uh oh!

ariostas commented Jun 13, 2025

Uh oh!

codecov bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ikrommyd commented Jun 13, 2025

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ianna commented Jun 16, 2025

Uh oh!

ariostas commented Jun 17, 2025

Uh oh!

ariostas commented Jun 17, 2025

Uh oh!

ianna commented Jun 17, 2025

Uh oh!

ariostas commented Jun 17, 2025

Uh oh!

Uh oh!

ianna commented Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jun 13, 2025 •

edited

Loading