Add deterministic path for CUDA `cumsum` by kurtamohler · Pull Request #136224 · pytorch/pytorch

kurtamohler · 2024-09-17T20:47:53Z

Change cumsum to call its decomposition when use_deterministic_algorithms(True) and input is CUDA.

Fixes #89492
Fixes #75240

cc @mruberry

pytorch-bot · 2024-09-17T20:47:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136224

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 1 Unrelated Failure

As of commit 0dd2ef2 with merge base 5516ac5 ():

NEW FAILURES - The following jobs have failed:

linux-binary-libtorch-cxx11-abi / libtorch-cpu-shared-with-deps-cxx11-abi-build / build (gh)
linux-binary-libtorch-pre-cxx11 / libtorch-cpu-shared-with-deps-pre-cxx11-build / build (gh)
bash: /builder/libtorch/build.sh: No such file or directory
linux-binary-manywheel / manywheel-py3_9-cuda11_8-build / build (gh)
bash: /builder/manywheel/build.sh: No such file or directory
linux-binary-manywheel / manywheel-py3_9-cuda12_1-build / build (gh)
bash: /builder/manywheel/build.sh: No such file or directory
linux-binary-manywheel / manywheel-py3_9-cuda12_4-build / build (gh)
bash: /builder/manywheel/build.sh: No such file or directory
pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 2, 3, linux.g4dn.12xlarge.nvidia.gpu) (gh)
distributed/test_inductor_collectives.py::TestCollectivesMultiProc::test_allreduce_inductor_cudagraph_trees

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / before-test / target-determination (gh) (detected as infra flaky with no runner)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/__init__.py

torch/functional.py

ezyang · 2024-09-19T16:57:29Z

test fail looks real

kurtamohler · 2024-09-19T18:40:22Z

test fail looks real

I'm having trouble figuring out how to fix that.

It looks like the graph context given to cumsum in torch/onnx/symbolic_opset11.py used to have an input variable called "%0", which was changed to "%x" for some reason. I'm not sure if that's the only difference, but I'd like to try changing it back to see if it fixes the test. But I'm not sure how to do that. I don't know what determines these variable names or why my changes messed with the variable name

test/onnx/test_operators.py

kurtamohler · 2024-09-19T21:36:50Z

@pytorchbot merge

pytorchmergebot · 2024-09-19T21:38:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-19T23:03:29Z

Merge failed

Reason: 2 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable), trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable)

Details for Dev Infra team

Raised by workflow job

pytorchmergebot · 2024-09-27T12:54:49Z

@kurtamohler your PR has been successfully reverted.

This reverts commit d1bb8e8. Reverted #136224 on behalf of https://github.com/atalman due to Break internal CI ([comment](#136224 (comment)))

ezyang · 2024-09-27T13:22:48Z

Whelp. I guess we have to put the logic in C++. Maybe we should just port the Python decomp to C++ then.

kurtamohler · 2024-10-09T21:17:37Z

Whelp. I guess we have to put the logic in C++. Maybe we should just port the Python decomp to C++ then.

Sounds good, working on it

kurtamohler · 2024-10-09T22:40:57Z

I've ported the decomp to C++

kurtamohler · 2024-10-10T06:48:46Z

@pytorchbot merge

pytorchmergebot · 2024-10-10T06:53:50Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ngimel · 2024-10-14T17:30:45Z

aten/src/ATen/native/cuda/ScanKernels.cpp

+    return at::sum_out(result, self.unsqueeze(0), /*dim=*/IntArrayRef{0});
  }
+  self = self.unsqueeze(dim + 1);
+  Tensor rg = at::arange(self.size(dim), c10::TensorOptions().device(self.device()));


this is quadratic memory usage for mask and for self, I don't think this solves the problem of large dim as it will OOM (perf is also quadratic obv)

Oh good point, thanks for bringing that up. I've reopened the issue for large inputs

yeah this is making the deterministic model quite unsable imo, e.g.

torch.cumsum(torch.rand([1, 1, 128256], device="cuda"), dim=-1)

This is using 60GB memory.

The alternative is hard error in deterministic mode, so the current state is still better? Or would you prefer this PR reverted @xw285cornell ?

There is often a funny situation where we buggily gave nondeterministic results but people prefer that over OOM/error 🤣

the difference is only in non-deterministic warn mode, where previously we would warn and take non-deterministic path and now will oom. For non-deterministic error mode if someone preferred non-deterministic result, that's on them, error means error.

It looks like some internal peeps are complaining, so I'm gonna yank this and we can talk about it more

So.... what should we do? Is there a torch.compile version that we can ship instead?

ezyang · 2024-10-30T14:41:45Z

@pytorchbot revert -c nosignal -m "larger memory usage apparently not acceptable"

pytorchmergebot · 2024-10-30T14:43:09Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-10-30T14:43:18Z

@kurtamohler your PR has been successfully reverted.

This reverts commit 383eba5. Reverted #136224 on behalf of https://github.com/ezyang due to larger memory usage apparently not acceptable ([comment](#136224 (comment)))

This reverts commit 383eba5. Reverted pytorch#136224 on behalf of https://github.com/ezyang due to larger memory usage apparently not acceptable ([comment](pytorch#136224 (comment)))

kurtamohler · 2024-12-05T17:34:52Z

Replaced by #140887

kurtamohler requested a review from janeyx99 September 17, 2024 20:47

kurtamohler commented Sep 17, 2024

View reviewed changes

torch/__init__.py Outdated Show resolved Hide resolved

pytorchbot added the open source label Sep 17, 2024

kurtamohler force-pushed the cumsum-deterministic-0 branch from 55e2ea4 to 9afa925 Compare September 17, 2024 21:11

kurtamohler added module: determinism release notes: python_frontend python frontend release notes category labels Sep 17, 2024

kurtamohler marked this pull request as ready for review September 17, 2024 21:20

kurtamohler requested a review from ezyang September 18, 2024 19:24

kurtamohler force-pushed the cumsum-deterministic-0 branch 2 times, most recently from 5c2fa20 to eccd643 Compare September 18, 2024 22:04

kurtamohler commented Sep 18, 2024

View reviewed changes

torch/functional.py Outdated Show resolved Hide resolved

kurtamohler force-pushed the cumsum-deterministic-0 branch from eccd643 to 6ef5ddc Compare September 18, 2024 23:54

ezyang approved these changes Sep 19, 2024

View reviewed changes

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 19, 2024

kurtamohler force-pushed the cumsum-deterministic-0 branch from 6ef5ddc to e6b9d31 Compare September 19, 2024 21:09

kurtamohler requested review from justinchuby, shubhambhokare1, titaiwangms and wschin as code owners September 19, 2024 21:09

kurtamohler commented Sep 19, 2024

View reviewed changes

test/onnx/test_operators.py Outdated Show resolved Hide resolved

justinchuby approved these changes Sep 19, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 19, 2024

pytorchmergebot added the merging label Sep 19, 2024

pytorchmergebot removed the merging label Sep 19, 2024

pytorchmergebot added a commit that referenced this pull request Sep 27, 2024

Revert "Add deterministic path for CUDA cumsum (#136224)"

e9d2765

This reverts commit d1bb8e8. Reverted #136224 on behalf of https://github.com/atalman due to Break internal CI ([comment](#136224 (comment)))

pytorchmergebot reopened this Sep 27, 2024

Add deterministic path for CUDA cumsum

06d27b6

kurtamohler force-pushed the cumsum-deterministic-0 branch from 8865cb4 to 1189a5d Compare October 9, 2024 22:31

kurtamohler requested review from eqy and syed-ahmed as code owners October 9, 2024 22:31

Implement in C++ instead

0dd2ef2

kurtamohler force-pushed the cumsum-deterministic-0 branch from 1189a5d to 0dd2ef2 Compare October 9, 2024 22:32

eqy approved these changes Oct 9, 2024

View reviewed changes

pytorchmergebot added the merging label Oct 10, 2024

pytorchmergebot closed this in 383eba5 Oct 10, 2024

pytorchmergebot removed the merging label Oct 10, 2024

ngimel reviewed Oct 14, 2024

View reviewed changes

kurtamohler mentioned this pull request Oct 14, 2024

Large cumulative sums appear to be nondeterministic. #75240

Closed

pytorchmergebot reopened this Oct 30, 2024

kurtamohler closed this Dec 5, 2024

Conversation

kurtamohler commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/136224

❌ 6 New Failures, 1 Unrelated Failure

Uh oh!

Uh oh!

Uh oh!

ezyang commented Sep 19, 2024

Uh oh!

kurtamohler commented Sep 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kurtamohler commented Sep 19, 2024

Uh oh!

pytorchmergebot commented Sep 19, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 19, 2024

Merge failed

Uh oh!

pytorchmergebot commented Sep 27, 2024

Uh oh!

ezyang commented Sep 27, 2024

Uh oh!

kurtamohler commented Oct 9, 2024

Uh oh!

kurtamohler commented Oct 9, 2024

Uh oh!

kurtamohler commented Oct 10, 2024

Uh oh!

pytorchmergebot commented Oct 10, 2024

Merge started

Uh oh!

ngimel Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

kurtamohler Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xw285cornell Oct 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngimel Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

ngimel Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang Oct 31, 2024

Choose a reason for hiding this comment

Uh oh!

ezyang commented Oct 30, 2024

Uh oh!

pytorchmergebot commented Oct 30, 2024

Uh oh!

pytorchmergebot commented Oct 30, 2024

Uh oh!

kurtamohler commented Dec 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

kurtamohler commented Sep 17, 2024 •

edited

Loading

pytorch-bot bot commented Sep 17, 2024 •

edited

Loading

kurtamohler commented Sep 19, 2024 •

edited

Loading

kurtamohler Oct 14, 2024 •

edited

Loading

xw285cornell Oct 26, 2024 •

edited

Loading