Skip to content

Fixed output memory format mismatch for bicubic2d#90470

Closed
vfdev-5 wants to merge 3 commits intopytorch:masterfrom
vfdev-5:fix-bicubic-out-mf
Closed

Fixed output memory format mismatch for bicubic2d#90470
vfdev-5 wants to merge 3 commits intopytorch:masterfrom
vfdev-5:fix-bicubic-out-mf

Conversation

@vfdev-5
Copy link
Copy Markdown
Contributor

@vfdev-5 vfdev-5 commented Dec 8, 2022

Description:

  • output memory format is matching input for bicubic2d

Problem: output tensor's memory format does not match input format for bicubic2d

import torch

i = torch.rand(1, 3, 32, 32).contiguous(memory_format=torch.channels_last)
assert i.is_contiguous(memory_format=torch.channels_last)
o = torch.nn.functional.interpolate(i, size=(4, 4), mode="bicubic")
assert o.is_contiguous(memory_format=torch.channels_last), f"Should be channels last but given channels first ({o.is_contiguous(memory_format=torch.contiguous_format)})"

> AssertionError: Should be channels last but given channels first (True)

Related PR fixing bilinear ops: #53535 (cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @bdhirsh )

Discovered together with @NicolasHug while working on https://github.com/pytorch/pytorch/tree/interpolate_uint8_images_linear_cpu_support_dev

  • Updated code to match grad input / output memory formats
  • temporary tensor creation matches memory format in separable_upsample_generic_Nd_kernel_impl
  • Updated tests
  • Added missing forward AD support for bicubic with antialiasing

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Dec 8, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90470

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures, 1 Pending

As of commit f2ad2d9:

FLAKY - The following jobs failed but were likely due to flakiness present on master:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot Bot added the release notes: nn release notes category label Dec 8, 2022
@github-actions github-actions Bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Dec 8, 2022
@vfdev-5 vfdev-5 changed the title Fixed output memory format for bicubic2d Fixed output memory format mismatch for bicubic2d Dec 8, 2022
@vfdev-5 vfdev-5 force-pushed the fix-bicubic-out-mf branch from 176951a to 3abb3fc Compare December 8, 2022 16:01
Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vfdev-5 !

@vfdev-5
Copy link
Copy Markdown
Contributor Author

vfdev-5 commented Dec 9, 2022

XLA job failure seems to be related : https://github.com/pytorch/pytorch/actions/runs/3649949448/jobs/6165647497#step:10:12105

@JackCaoG can you help with debuging this issue please

@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Dec 9, 2022

hmm, it seems like test just takes too long to compile then it was killed...

@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Dec 9, 2022

I opened pytorch/xla#4308, if gpu test works then I think it might be a cpu compiler issue. It is a bit concerning that with this change now compilation significantly increased through.

@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Dec 9, 2022

@vfdev-5 Do you ming rebasing this pr? I was not able to build on our CI since there were some offending pr merged in pytorch side.

@vfdev-5
Copy link
Copy Markdown
Contributor Author

vfdev-5 commented Dec 12, 2022

@JackCaoG
Copy link
Copy Markdown
Collaborator

@wonjoolee95 can you follow up on this one? I trigger the gpu ci in pytorch/xla#4308 again. If you see that GPU test passed we can conclude that this pr will somehow generate a graph that's hard to compile for XLA:CPU, which I think is fine. We can disable the test either on pytorch end or xla end on xla devices. If GPU test also failed with a compilation timeout I think we have a bigger problem since we do have real user for it.

@wonjoo-wj
Copy link
Copy Markdown
Collaborator

@wonjoolee95 can you follow up on this one? I trigger the gpu ci in pytorch/xla#4308 again. If you see that GPU test passed we can conclude that this pr will somehow generate a graph that's hard to compile for XLA:CPU, which I think is fine. We can disable the test either on pytorch end or xla end on xla devices. If GPU test also failed with a compilation timeout I think we have a bigger problem since we do have real user for it.

Sounds good, I'll monitor the GPU test CI and keep this thread updated.

@wonjoo-wj
Copy link
Copy Markdown
Collaborator

Seems like the XLA's GPU CI is stuck as well, specifically getting an error (I'm guessing timeout) for the test test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bilinear_xla.

@vfdev-5
Copy link
Copy Markdown
Contributor Author

vfdev-5 commented Dec 13, 2022

Other related failures:

======================================================================
ERROR [0.351s]: test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_bicubic_cuda_float32 (__main__.TestMetaCUDA)
...
   File "/var/lib/jenkins/workspace/test/test_meta.py", line 357, in test_assert
    raise RuntimeError(f"output {i}: {msg_callable(msg)}")
RuntimeError: output 0: meta disagrees with real impl:
aten.upsample_bicubic2d.default(
  tensor(..., device='meta', size=(2, 3, 4, 4)) stride=(48, 1, 12, 3),
  [3, 3],
  True,

) = (
  tensor(..., device='meta', size=(2, 3, 3, 3)) stride=(27, 9, 3, 1)
)
but real stride was (27, 1, 9, 3)
2022-12-12T08:56:34.2366547Z ======================================================================
2022-12-12T08:56:34.2366834Z FAIL [0.097s]: test_upsamplingBiMode2d_antialias_False_align_corners_False_mode_bicubic_cuda (__main__.TestNNDeviceTypeCUDA)
2022-12-12T08:56:34.2367113Z ----------------------------------------------------------------------
2022-12-12T08:56:34.2367253Z Traceback (most recent call last):
2022-12-12T08:56:34.2367609Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2053, in wrapper
2022-12-12T08:56:34.2367730Z     method(*args, **kwargs)
2022-12-12T08:56:34.2368116Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 378, in instantiated_test
2022-12-12T08:56:34.2368235Z     result = test(self, **param_kwargs)
2022-12-12T08:56:34.2368511Z   File "/var/lib/jenkins/workspace/test/test_nn.py", line 9362, in test_upsamplingBiMode2d
2022-12-12T08:56:34.2368679Z     self.assertEqual(a_cuda.grad, a_cpu.grad)
2022-12-12T08:56:34.2369043Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2859, in assertEqual
2022-12-12T08:56:34.2369152Z     assert_equal(
2022-12-12T08:56:34.2369491Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 1270, in assert_equal
2022-12-12T08:56:34.2369678Z     raise error_metas[0].to_error(msg)
2022-12-12T08:56:34.2369891Z AssertionError: Tensor-likes are not close!
2022-12-12T08:56:34.2369911Z 
2022-12-12T08:56:34.2370026Z Mismatched elements: 44 / 48 (91.7%)
2022-12-12T08:56:34.2370324Z Greatest absolute difference: 1.3735964907929827 at index (1, 0, 1, 2) (up to 1e-07 allowed)
2022-12-12T08:56:34.2370621Z Greatest relative difference: 37.21617048686028 at index (0, 1, 1, 1) (up to 1e-07 allowed)
2022-12-12T08:56:34.2370640Z 
2022-12-12T08:56:34.2370784Z ======================================================================
2022-12-12T08:56:34.2371067Z FAIL [0.095s]: test_upsamplingBiMode2d_antialias_False_align_corners_True_mode_bicubic_cuda (__main__.TestNNDeviceTypeCUDA)
2022-12-12T08:56:34.2371334Z ----------------------------------------------------------------------
2022-12-12T08:56:34.2371472Z Traceback (most recent call last):
2022-12-12T08:56:34.2371834Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2053, in wrapper
2022-12-12T08:56:34.2371936Z     method(*args, **kwargs)
2022-12-12T08:56:34.2372320Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 378, in instantiated_test
2022-12-12T08:56:34.2372455Z     result = test(self, **param_kwargs)
2022-12-12T08:56:34.2372675Z   File "/var/lib/jenkins/workspace/test/test_nn.py", line 9362, in test_upsamplingBiMode2d
2022-12-12T08:56:34.2372834Z     self.assertEqual(a_cuda.grad, a_cpu.grad)
2022-12-12T08:56:34.2373198Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2859, in assertEqual
2022-12-12T08:56:34.2373308Z     assert_equal(
2022-12-12T08:56:34.2373650Z   File "/opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py", line 1270, in assert_equal
2022-12-12T08:56:34.2373766Z     raise error_metas[0].to_error(msg)
2022-12-12T08:56:34.2373981Z AssertionError: Tensor-likes are not close!
2022-12-12T08:56:34.2374001Z 
2022-12-12T08:56:34.2374133Z Mismatched elements: 12 / 48 (25.0%)
2022-12-12T08:56:34.2374425Z Greatest absolute difference: 2.710664614723657 at index (1, 0, 1, 2) (up to 1e-07 allowed)
2022-12-12T08:56:34.2374699Z Greatest relative difference: inf at index (0, 0, 0, 3) (up to 1e-07 allowed)
2022-12-12T08:56:34.2374719Z 
2022-12-12T08:56:34.2374987Z ----------------------------------------------------------------------
2022-12-12T08:56:34.2375107Z Ran 2303 tests in 180.681s
2022-12-12T08:56:34.2375126Z 
2022-12-12T08:56:34.2375300Z FAILED (failures=2, skipped=75, expected failures=10)

I'll investigate these and keep updated.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Dec 13, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: vfdev-5 / name: vfdev (3abb3fc45c549714de5b2c4f2d1a7fd6a9b37a8e, 449de510c66390c62e737049f0bf25232a9a69af, 2335adec6c85cb2c42f1b9904941394d87ad9ad2, 0dae71fdfb2c9c030ba15b79f54a962030a87465)

Comment thread torch/_decomp/decompositions.py Outdated
@vfdev-5 vfdev-5 closed this Dec 13, 2022
@vfdev-5 vfdev-5 reopened this Dec 13, 2022
@vfdev-5 vfdev-5 closed this Dec 13, 2022
@vfdev-5 vfdev-5 reopened this Dec 13, 2022
@vfdev-5
Copy link
Copy Markdown
Contributor Author

vfdev-5 commented Dec 14, 2022

@JackCaoG I fixed in the recent commit the issue this PR had with grad output memory format, I reverted to code and it fixed issues mentioned in #90470 (comment) but CI still failing on xla.

@JackCaoG
Copy link
Copy Markdown
Collaborator

JackCaoG commented Dec 14, 2022

@wonjoolee95 can follow up, we can dump the hlo and maybe check with XLA folks why the hlo took so long to compile. This back and forth might take a few days, is this pr urgent?

@vfdev-5
Copy link
Copy Markdown
Contributor Author

vfdev-5 commented Dec 14, 2022

@JackCaoG thanks for the feedback, it is not urgent and we can wait for some time. However, this #90771 depends on this code change.

@JackCaoG
Copy link
Copy Markdown
Collaborator

Thanks for the context, we will try to move a bit faster to unblock this pr, thank you for you patience!

@wonjoo-wj
Copy link
Copy Markdown
Collaborator

wonjoo-wj commented Dec 19, 2022

@vfdev-5, apologies for the delay. From XLA's side, we have disabled the test for now. Could you update the XLA pin (https://github.com/pytorch/pytorch/blob/master/.github/ci_commit_pins/xla.txt) to 66c2c15df992c9a683e3b08811a7c08ebeda0a2f and re-trigger the CI? As you do that, rebasing this PR with the master would be helpful, too. Thanks!

Please refer to Jack's comment below to use onlyNativeDeviceTypes (ex: https://github.com/pytorch/pytorch/blob/master/test/test_torch.py#L1277). Thanks!

@JackCaoG
Copy link
Copy Markdown
Collaborator

Maybe you can use onlyNativeDeviceTypes to prevent it from running on xla devices, so you don't need to pin the xla pr.

@lezcano lezcano added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 12, 2023
Copy link
Copy Markdown
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@lezcano
Copy link
Copy Markdown
Collaborator

lezcano commented Jan 12, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: This PR is too stale; the last push date was more than 3 days ago. Please rebase and try again. You can rebase by leaving the following comment on this PR:
@pytorchbot rebase

Details for Dev Infra team Raised by workflow job

@vfdev-5
Copy link
Copy Markdown
Contributor Author

vfdev-5 commented Jan 12, 2023

@pytorchbot rebase

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

Description:

- output memory format is matching input for bicubic2d

Problem: output tensor's memory format does not match input format for bicubic2d

```python
import torch

i = torch.rand(1, 3, 32, 32).contiguous(memory_format=torch.channels_last)
assert i.is_contiguous(memory_format=torch.channels_last)
o = torch.nn.functional.interpolate(i, size=(4, 4), mode="bicubic")
assert o.is_contiguous(memory_format=torch.channels_last), f"Should be channels last but given channels first ({o.is_contiguous(memory_format=torch.contiguous_format)})"
```

Related PR fixing bilinear ops: pytorch#53535

Discovered by Nicolas Hug while working on https://github.com/pytorch/pytorch/tree/interpolate_uint8_images_linear_cpu_support_dev

- Updated tests
- Added missing forward AD support for bicubic with antialiasing
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Successfully rebased fix-bicubic-out-mf onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix-bicubic-out-mf && git pull --rebase)

@lezcano
Copy link
Copy Markdown
Collaborator

lezcano commented Jan 12, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 2 additional jobs have failed, first few of them are: trunk ,trunk / linux-focal-rocm5.3-py3.8 / test (default, 1, 2, linux.rocm.gpu)

Details for Dev Infra team Raised by workflow job

@lezcano
Copy link
Copy Markdown
Collaborator

lezcano commented Jan 12, 2023

@pytorchbot merge -f "timed out, unrelated"

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
Description:

- output memory format is matching input for bicubic2d

Problem: output tensor's memory format does not match input format for bicubic2d

```python
import torch

i = torch.rand(1, 3, 32, 32).contiguous(memory_format=torch.channels_last)
assert i.is_contiguous(memory_format=torch.channels_last)
o = torch.nn.functional.interpolate(i, size=(4, 4), mode="bicubic")
assert o.is_contiguous(memory_format=torch.channels_last), f"Should be channels last but given channels first ({o.is_contiguous(memory_format=torch.contiguous_format)})"

> AssertionError: Should be channels last but given channels first (True)
```

Related PR fixing bilinear ops: pytorch#53535 (cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @bdhirsh )

Discovered together with @NicolasHug while working on https://github.com/pytorch/pytorch/tree/interpolate_uint8_images_linear_cpu_support_dev

- Updated code to match grad input / output memory formats
- temporary tensor creation matches memory format in `separable_upsample_generic_Nd_kernel_impl`
- Updated tests
- Added missing forward AD support for bicubic with antialiasing

Pull Request resolved: pytorch#90470
Approved by: https://github.com/NicolasHug, https://github.com/lezcano
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source release notes: nn release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants