Add CI workflow for tests that requires pytorch CUDA.#7073
Add CI workflow for tests that requires pytorch CUDA.#7073vanbasten23 wants to merge 34 commits intomasterfrom
Conversation
5f27b23 to
f79aee7
Compare
|
Note to myself: it seems that BUILD XLA CUDA plugin requires the env var |
9a251c7 to
91abb38
Compare
|
Seems without installing the cuda plugin, the tests would fail with error https://gist.github.com/vanbasten23/7dd6ddeaad93843e57653990c43cf476 |
9c59552 to
723f9d1
Compare
| cuda_t = torch.arange(25, device=torch.device('cuda')).reshape(5, 5) | ||
|
|
||
| t1 = cuda_t.t() | ||
| print('xw32 t1.device=', t1.device) |
| BAZEL_REMOTE_CACHE: 1 | ||
| BUILD_CPP_TESTS: 1 | ||
| steps: | ||
| - name: Setup gcloud |
There was a problem hiding this comment.
This setup is repetitive. I was already on the fence about encapsulating it in a new action (since it already appears the build action, test actions, and the docs push. If we add another copy, it really should be encapsulated so we don't have to update a bunch of places at once.
| name: torch-with-cuda-xla-with-cuda-wheels | ||
| path: /tmp/wheels/ | ||
| pattern: torch-*.whl | ||
| - name: Fetch CUDA plugin |
There was a problem hiding this comment.
This setup is repetitive. I was already on the fence about encapsulating it in a new action (since it already appears the build action, test actions, and the docs push). If we add another copy that's more-or-less identical, it really should be encapsulated so we don't have to update a bunch of places at once.
Stepping back, is there a way to merge this with _test.yml? You would need to add a parameter for some of the test groups to install the torch CUDA build.
There was a problem hiding this comment.
I can see your point. Let me give it a try.
| shell: bash | ||
| run: | | ||
| cd pytorch/xla/infra/ansible | ||
| ansible-playbook playbook.yaml -vvv -e "stage=build arch=amd64 accelerator=cuda cuda_compute_capabilities=5.2,7.5 src_root=${GITHUB_WORKSPACE} build_cpp_tests=1 git_versioned_xla_build=1 cache_suffix=-ci build_pytorch_with_cuda=1" --skip-tags=fetch_srcs,install_deps |
There was a problem hiding this comment.
I know this goes against my philosophy of "put everything in ansible", but what if you just build the PyTorch CUDA wheel directly here? I don't think we should build multiple copies of torch_xla and torchvision.
Building PyTorch with CUDA support is only part of our CI workflow, and it will never be part of our release workflow. It's okay in my mind to just directly USE_CUDA=1 python setup.py bdist_wheel here and upload only the torch GPU wheel as an artifact.
The test workflow can then use the same torch-xla, torch-xla-cuda-plugin, and torchvision.
7576407 to
bdb37d7
Compare
|
Thanks for working on this. Added myself as a reviewer as I also need this for Triton tests. |
cbd190a to
1e910fb
Compare
1e910fb to
bcd007c
Compare
|
close it in favor of #7140 |
This PR adds a new CI workflow that build pytorch with CUDA enabled from source, build pytorch/xla with CUDA enabled from source, then run tests. The intention is to run tests that requires pytorch with CUDA.
In detail, this PR add 2 more jobs to .github/workflows/build_and_test.yml