Skip to content

ci: add workflow to run GPU tests on selfhosted runner#3543

Merged
ianna merged 5 commits intoscikit-hep:mainfrom
ariostas:selfhosted_runner
Jun 17, 2025
Merged

ci: add workflow to run GPU tests on selfhosted runner#3543
ianna merged 5 commits intoscikit-hep:mainfrom
ariostas:selfhosted_runner

Conversation

@ariostas
Copy link
Copy Markdown
Member

This PR adds a workflow to run the GPU tests on the new CI node at Princeton. There are a couple of things to discuss:

  • I intentionally didn't add the new job to pass-tests.needs since I think we should first make sure that it is reliable enough that it's not going to end up blocking PR regularly.
  • For now, I added it as a job in the usual test.yml workflow. This means that it will run for all PRs and pushes to main. There are some security concerns with this, so we could consider moving it to a separate workflow that needs to be triggered by a workflow_dispatch trigger, which only people with write permissions can do. And we could also run nightly tests to make sure things are not breaking. But I think it might also be fine to leave it it with the normal workflow for the following reasons:
    • For first-time contributors we need to approve running the tests.
    • I have the runner running inside a container with very limited access to the filesystem. So even if they wipe everything it's fine. I think a bigger concern would be someone trying to mine bitcoin or use the cluster for a DDOS attack or something.
    • Many of us have eyes on this repo, so I think if someone tries to do something we'll find out and stop it pretty quickly.

I don't have permissions to add a runner for this repo, so I'll need someone (probably @ianna) to help me with that. I tested it on my fork and it's working well. Here are the logs for a test run I did. There's actually one test that fails, so we should look into that.

After we settle all these things, I'll add a page in the wiki describing how to set up the selfhoster runner.

@ikrommyd
Copy link
Copy Markdown
Collaborator

you will probably be getting this: #3480 which needs fixing

@ariostas
Copy link
Copy Markdown
Member Author

you will probably be getting this: #3480 which needs fixing

Oh yeah, that's the one. I didn't realize it was already reported

@codecov
Copy link
Copy Markdown

codecov bot commented Jun 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.42%. Comparing base (b749e49) to head (fa1cd5e).
Report is 361 commits behind head on main.

Additional details and impacted files

see 195 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ikrommyd
Copy link
Copy Markdown
Collaborator

you will probably be getting this: #3480 which needs fixing

Oh yeah, that's the one. I didn't realize it was already reported

Yeah I found this some time ago on my desktop. I rarely install cudf though.

Copy link
Copy Markdown
Member

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ariostas - Great job! I'll add the runner :-)

@ariostas ariostas force-pushed the selfhosted_runner branch from 190585d to cf1d576 Compare June 16, 2025 10:59
@ianna
Copy link
Copy Markdown
Member

ianna commented Jun 16, 2025

@ariostas - I suggest we mark the cudf string test as failed to move on with the PR. Thanks!

tests-cuda/test_3051_to_cuda.py::test_null PASSED                        [ 24%]
Error: test_strings

TypeError: StringColumn.__init__() missing 2 required positional arguments: 'size' and 'dtype'

This error occurred while calling

    ak.to_cudf(
        <Array ['hey', 'hi', 'hum'] type='3 * string'>
    )

@ariostas
Copy link
Copy Markdown
Member Author

I suggest we mark the cudf string test as failed to move on with the PR. Thanks!

Sounds good. I'll make a couple of final tweaks. I'm just trying to simplify the setup a bit.

@ariostas ariostas force-pushed the selfhosted_runner branch from 50443f5 to cf86ace Compare June 17, 2025 08:26
@ariostas ariostas force-pushed the selfhosted_runner branch from cf86ace to fa1cd5e Compare June 17, 2025 08:30
@ariostas ariostas marked this pull request as ready for review June 17, 2025 08:33
@ariostas
Copy link
Copy Markdown
Member Author

Okay, this is ready now

@ianna
Copy link
Copy Markdown
Member

ianna commented Jun 17, 2025

Okay, this is ready now

I'll enable auto-merge then. Thanks!

@ianna ianna enabled auto-merge (squash) June 17, 2025 08:37
@ariostas
Copy link
Copy Markdown
Member Author

@ianna would the wiki be a good place to write some documentation about how to set up the runner?

@ianna ianna merged commit f14d248 into scikit-hep:main Jun 17, 2025
44 checks passed
@ianna
Copy link
Copy Markdown
Member

ianna commented Jun 17, 2025

@ianna would the wiki be a good place to write some documentation about how to set up the runner?

if it's awkward-specific, I think a maintainer-guide directory in docs would be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants