Skip to content

[ROCm] Move ROCm CI build to python 3.8 version#86677

Closed
pruthvistony wants to merge 1 commit intopytorch:masterfrom
ROCm:rocm_py38_upgrade
Closed

[ROCm] Move ROCm CI build to python 3.8 version#86677
pruthvistony wants to merge 1 commit intopytorch:masterfrom
ROCm:rocm_py38_upgrade

Conversation

@pruthvistony
Copy link
Collaborator

@pruthvistony pruthvistony commented Oct 11, 2022

Currently it is python 3.7 want to upgrade to python 3.8

cc @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 11, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86677

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 3 Pending

As of commit 489c914:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: rocm AMD GPU support for Pytorch topic: not user facing topic category labels Oct 11, 2022
@pruthvistony pruthvistony added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Oct 11, 2022
@pruthvistony
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #86677, but it was already up to date

@malfet
Copy link
Contributor

malfet commented Oct 11, 2022

Created pytorch-linux-focal-rocm5.2-py3.8 and pytorch-linux-focal-rocm5.1-py3.8 ecr containers

@pruthvistony pruthvistony marked this pull request as ready for review October 11, 2022 22:18
@pruthvistony pruthvistony requested review from a team and jeffdaily as code owners October 11, 2022 22:18
@pytorch pytorch deleted a comment from pytorch-bot bot Oct 12, 2022
@pytorch pytorch deleted a comment from pytorch-bot bot Oct 12, 2022
@samdow samdow added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 12, 2022
@pruthvistony
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule OSS CI):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@pruthvistony
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased rocm_py38_upgrade onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rocm_py38_upgrade && git pull --rebase)

@pruthvistony
Copy link
Collaborator Author

pruthvistony commented Oct 14, 2022

@seemethere @malfet ,

See the below failures in few CUDA jobs

ERROR [0.003s]: test_nn_functional_group_norm (main.TestFunctionalTracing)

Traceback (most recent call last):
File "/var/lib/jenkins/workspace/test/test_fx.py", line 4075, in functional_test
symbolic_trace(fn)
File "/opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 1070, in symbolic_trace
graph = tracer.trace(root, concrete_args)
File "/opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 739, in trace
(self.create_arg(fn(*args)),),
File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 2523, in group_norm
_verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(input.size()[2:]))
File "/opt/conda/lib/python3.10/site-packages/torch/fx/proxy.py", line 274, in iter
return self.tracer.iter(self)
File "/opt/conda/lib/python3.10/site-packages/torch/fx/proxy.py", line 183, in iter
raise TraceError('Proxy object cannot be iterated. This can be '
torch.fx.proxy.TraceError: Proxy object cannot be iterated. This can be attempted when the Proxy is used in a loop or as a *args or **kwargs function argument. See the torch.fx docs on pytorch.org for a more detailed explanation of what types of control flow can be traced, and check out the Proxy docstring for help troubleshooting Proxy iteration errors

As I understand this PR triggers a rebuild of all CI images for ROCm but not sure how CUDA image is changing, can you please help on why the CUDA images are failing. Checking on the error it is related to python version (#60069).

@huydhn
Copy link
Contributor

huydhn commented Oct 15, 2022

See the below failures in few CUDA jobs

ERROR [0.003s]: test_nn_functional_group_norm (main.TestFunctionalTracing)

Traceback (most recent call last): File "/var/lib/jenkins/workspace/test/test_fx.py", line 4075, in functional_test symbolic_trace(fn) File "/opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 1070, in symbolic_trace graph = tracer.trace(root, concrete_args) File "/opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py", line 739, in trace (self.create_arg(fn(*args)),), File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 2523, in group_norm _verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(input.size()[2:])) File "/opt/conda/lib/python3.10/site-packages/torch/fx/proxy.py", line 274, in iter return self.tracer.iter(self) File "/opt/conda/lib/python3.10/site-packages/torch/fx/proxy.py", line 183, in iter raise TraceError('Proxy object cannot be iterated. This can be ' torch.fx.proxy.TraceError: Proxy object cannot be iterated. This can be attempted when the Proxy is used in a loop or as a *args or **kwargs function argument. See the torch.fx docs on pytorch.org for a more detailed explanation of what types of control flow can be traced, and check out the Proxy docstring for help troubleshooting Proxy iteration errors

As I understand this PR triggers a rebuild of all CI images for ROCm but not sure how CUDA image is changing, can you please help on why the CUDA images are failing. Checking on the error it is related to python version (#60069).

Surprisingly, I'm also seeing exactly the same failure on my "benign" PR #86993 to update docker image. So it's not a coincidence, let me try to dig a bit deeper to understand what's going on here

@huydhn
Copy link
Contributor

huydhn commented Oct 24, 2022

Rebase the PR works for me #86993. So I suspect some unrelated test failures (either flaky or they have been fixed)

@huydhn
Copy link
Contributor

huydhn commented Oct 24, 2022

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased rocm_py38_upgrade onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout rocm_py38_upgrade && git pull --rebase)

@jithunnair-amd
Copy link
Collaborator

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
Currently it is python 3.7 want to upgrade to python 3.8
Pull Request resolved: pytorch#86677
Approved by: https://github.com/malfet
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Currently it is python 3.7 want to upgrade to python 3.8
Pull Request resolved: pytorch#86677
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants