[CUDA] Large tensor maxpool crash fix by Isalia20 · Pull Request #165374 · pytorch/pytorch

Isalia20 · 2025-10-13T22:36:44Z

Fixes #165297

cc @ptrblck @msaroufim @eqy @jerryzh168

pytorch-bot · 2025-10-13T22:36:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165374

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 2ec7dd1 with merge base 01738a3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy · 2025-10-13T22:50:07Z

aten/src/ATen/native/cuda/DilatedMaxPool2d.cu


-static __device__ inline int p_start(int size, int pad, int kernel, int dilation, int stride) {
-  return (size + pad < ((kernel - 1) * dilation + 1)) ? 0 : (size + pad - ((kernel - 1) * dilation + 1)) / stride + 1;
+template <typename T>


nit: index_t

eqy · 2025-10-13T22:50:35Z

aten/src/ATen/native/cuda/DilatedMaxPool2d.cu

+    int64_t in_stride_n, int64_t in_stride_c,
+    int64_t in_stride_h, int64_t in_stride_w)
+{
+  const int64_t int_max = std::numeric_limits<int>::max();


constexpr?

eqy · 2025-10-13T22:55:18Z

test/test_nn.py

+        # https://github.com/pytorch/pytorch/issues/165297
+        N, C, H, W = 70, 64, 512, 960  # dims to extend > int32
+        device = torch.device("cuda")
+        x_cuda = torch.randn(N, C, H, W, device=device)


Do memory requirements go down if a narrower dtype such as half is used?

Yes, decreased needed memory and updated test to use float16. Initially I wanted to trigger the same illegal memory access that was in the issue with float32 but testing in float16 should be sufficient as well since we compare it to nchw format for correctness

eqy · 2025-10-14T21:25:00Z

@pytorchmergebot rebase

pytorchmergebot · 2025-10-14T21:26:37Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-10-14T21:26:40Z

Tried to rebase and push PR #165374, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

Isalia20 · 2025-10-14T21:28:45Z

@pytorchbot rebase -b main

pytorchmergebot · 2025-10-14T21:30:21Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

pytorchmergebot · 2025-10-14T21:30:24Z

Successfully rebased cuda-maxpool-int64 onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout cuda-maxpool-int64 && git pull --rebase)

Isalia20 · 2025-10-15T07:47:51Z

@pytorchbot merge

pytorch-bot · 2025-10-15T07:47:55Z

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

Isalia20 · 2025-10-15T08:47:33Z

@eqy Need an approval of workflow here and then we can merge I guess

malfet · 2025-10-15T17:43:08Z

aten/src/ATen/native/cuda/DilatedMaxPool2d.cu

+template <typename index_t>
+__device__ inline index_t dmin(index_t a, index_t b) {
+  return a <= b ? a : b;
+}


What's wrong with std::min?

malfet · 2025-10-15T17:43:29Z

aten/src/ATen/native/cuda/DilatedMaxPool2d.cu

+
+template <typename index_t>
+static __device__ inline index_t p_start(index_t size, int pad, int kernel, int dilation, int stride) {
+  const index_t kernel_extent = static_cast<index_t>((kernel - 1) * dilation + 1);


Nit

Suggested change

const index_t kernel_extent = static_cast<index_t>((kernel - 1) * dilation + 1);

const auto kernel_extent = static_cast<index_t>((kernel - 1) * dilation + 1);

malfet · 2025-10-15T17:44:43Z

test/test_nn.py

                                    "fractional_max_pool2d requires output_ratio to either be a single Int or tuple of Ints."):
            res = arg_class(*arg_3)

+    @unittest.skipIf(not TEST_CUDA, "CUDA not available")


This makes me sad, we really should use device and @onlyCUDA decorator

malfet · 2025-10-15T17:45:14Z

test/test_nn.py

+        device = torch.device("cuda")
+        x_cuda = torch.randn(N, C, H, W, device=device, dtype=torch.float16)


Nit

Suggested change

device = torch.device("cuda")

x_cuda = torch.randn(N, C, H, W, device=device, dtype=torch.float16)

x_cuda = torch.randn(N, C, H, W, device="cuda", dtype=torch.float16)

…nto cuda-maxpool-int64

Isalia20 · 2025-10-16T07:52:17Z

@pytorchbot merge

pytorchmergebot · 2025-10-16T07:54:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes pytorch#165297 Pull Request resolved: pytorch#165374 Approved by: https://github.com/eqy, https://github.com/malfet

fix

785bf27

Isalia20 requested review from Aidyn-A, eqy and syed-ahmed as code owners October 13, 2025 22:36

pytorch-bot bot added the release notes: nn release notes category label Oct 13, 2025

Isalia20 added topic: bug fixes topic category module: cuda Related to torch.cuda, and CUDA support in general labels Oct 13, 2025

pytorchbot added the open source label Oct 13, 2025

eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 13, 2025

eqy approved these changes Oct 13, 2025

View reviewed changes

eqy added the ciflow/h100 label Oct 13, 2025

update

c9904d3

pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/h100 labels Oct 14, 2025

eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 14, 2025

Isalia20 added 2 commits October 14, 2025 21:30

fix

52619e6

update

054d398

pytorchmergebot force-pushed the cuda-maxpool-int64 branch from c9904d3 to 054d398 Compare October 14, 2025 21:30

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Oct 14, 2025

malfet approved these changes Oct 15, 2025

View reviewed changes

Isalia20 added 2 commits October 16, 2025 00:10

update

7f9a2dc

Merge branch 'cuda-maxpool-int64' of github-isalia:Isalia20/pytorch i…

2ec7dd1

…nto cuda-maxpool-int64

Isalia20 added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 15, 2025

pytorchmergebot added the merging label Oct 16, 2025

pytorchmergebot added the Merged label Oct 16, 2025

pytorchmergebot closed this in d73c283 Oct 16, 2025

pytorchmergebot removed the merging label Oct 16, 2025

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025

[CUDA] Large tensor maxpool crash fix (pytorch#165374)

5df9ece

Fixes pytorch#165297 Pull Request resolved: pytorch#165374 Approved by: https://github.com/eqy, https://github.com/malfet

zhudada0120 pushed a commit to zhudada0120/pytorch that referenced this pull request Oct 22, 2025

[CUDA] Large tensor maxpool crash fix (pytorch#165374)

0004b2c

Fixes pytorch#165297 Pull Request resolved: pytorch#165374 Approved by: https://github.com/eqy, https://github.com/malfet

	const index_t kernel_extent = static_cast<index_t>((kernel - 1) * dilation + 1);
	const auto kernel_extent = static_cast<index_t>((kernel - 1) * dilation + 1);

		device = torch.device("cuda")
		x_cuda = torch.randn(N, C, H, W, device=device, dtype=torch.float16)

Conversation

Isalia20 commented Oct 13, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165374

⏳ No Failures, 1 Pending

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eqy commented Oct 14, 2025

Uh oh!

pytorchmergebot commented Oct 14, 2025

Uh oh!

pytorchmergebot commented Oct 14, 2025

Uh oh!

Isalia20 commented Oct 14, 2025

Uh oh!

pytorchmergebot commented Oct 14, 2025

Uh oh!

pytorchmergebot commented Oct 14, 2025

Uh oh!

Isalia20 commented Oct 15, 2025

Uh oh!

pytorch-bot bot commented Oct 15, 2025

Uh oh!

Isalia20 commented Oct 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Isalia20 commented Oct 16, 2025

Uh oh!

pytorchmergebot commented Oct 16, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Isalia20 commented Oct 13, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 13, 2025 •

edited

Loading