try setting MAX_JOBS=4 for oom in arm wheel#1804
try setting MAX_JOBS=4 for oom in arm wheel#1804tinglvv wants to merge 33 commits intopytorch:mainfrom
Conversation
|
@pytorchbot rebase |
nWEIdia
left a comment
There was a problem hiding this comment.
Re-reviewing since we are having libopenblas.so test issues.
|
@pytorchbot rebase |
|
Please rebase so that the s390x errors will not show up: https://hud.pytorch.org/pytorch/pytorch/pull/126174 For the cuda test failures, we need to wait for ARM + CUDA instance availability: e.g. https://aws.amazon.com/ec2/instance-types/g5g/ |
nWEIdia
left a comment
There was a problem hiding this comment.
Rebase is needed to fix some ibm errors.
Otherwise, looks great!
Thanks for reviewing. I think we need the SBSA nvidia driver 550.54.15 to be uploaded to AWS instead of the instance availability. I started https://github.com/pytorch/test-infra/pull/5218/files to be merged once we upload the sbsa nvidia driver runfile to https://s3.amazonaws.com/ossci-linux/nvidia_driver/. |
|
@pytorchbot rebase |
The error message was "RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx" Would we need to have a nvidia gpu first and then installing a driver? The M7G instance does not have an NVIDIA GPU. |
|
And good catch, eventually we would need SBSA nvidia driver 550.54.15 to be uploaded to AWS for the test to work. |
|
The rebase command may not work on pytorch/builder repo. A manual rebase is needed. |
|
Yes we will need an ARM+CUDA instance, thanks for catching that. |
* Disable automatic building of s390x docker image * Update docker image and build scripts for s390x * Switch devtoolset to 13 There is a not yet investigated build failure caused by gcc 12, but it doesn't reproduce with gcc 13. * Adapt binaries check for s390x * Switch to ubuntu:24.04 for s390x * Update libgomp.so.1 path for s390x
This reverts commit 6b90c09.
* Don't deactivate/remove conda on linux * test
* Add manylinux_2_28 image
* Manylinux 2_28 fix cmake install * fix
This reverts commit bebc062.
|
please ignore the above commits created by rebase, will resolve these later. |
https://github.com/pytorch/pytorch/actions/runs/8840652730/job/24276381274?pr=124112 hitting OOM error in building cuda ARM wheel.
Try changing MAX_JOBS.