compiled autograd: default fakeify backward inputs with static shapes instead of duck sizing by bdhirsh · Pull Request #133581 · pytorch/pytorch

bdhirsh · 2024-08-15T16:29:51Z

This fixes the problem described here, by tweaking compiled autograd to default to fakifying with static shapes instead of defaulting to duck sizing.

I think there are some (independent) dynamic shape + compiled autograd issues though, detailed here #133575 (cc @ezyang @penguinwu @bobrenjc93 @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @xmfan @yf225 @rec )

Stack from ghstack (oldest at bottom):

-> compiled autograd: default fakeify backward inputs with static shapes instead of duck sizing #133581
compiled autograd: support accumulate_grad_ in DTensor sharding #133580
preserve node stacktraces from compiled autograd through AOTDispatcher, due to GmWrapper #133574

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

… instead of duck sizing [ghstack-poisoned]

pytorch-bot · 2024-08-15T16:29:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133581

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 40 New Failures, 1 Cancelled Job, 1 Unrelated Failure

As of commit 64d034e with merge base 454713f ():

NEW FAILURES - The following jobs have failed:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_int64
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_complex64
inductor / cuda12.1-py3.12-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_int64
inductor / cuda12.1-py3.12-gcc9-sm86 / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
test_torch.py::TestTorchDeviceTypeCUDA::test_gradient_all_cuda_complex64
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_avx2, 1, 2, linux.10xlarge.avx2) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_gradient_all_cpu_complex64
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_avx2, 2, 2, linux.10xlarge.avx2) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_gradient_all_cpu_int64
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_avx512, 1, 2, linux.12xlarge) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_gradient_all_cpu_float32
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_avx512, 2, 2, linux.12xlarge) (gh)
test_torch.py::TestTorchDeviceTypeCPU::test_gradient_all_cpu_complex64
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 1, 5, amz2023.linux.4xlarge.nvidia.gpu) (gh)
inductor/test_distributed_patterns.py::DistributedPatternTests::test_intermediate_hook_with_closure
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 2, 5, amz2023.linux.4xlarge.nvidia.gpu) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_no_raise_dynamic_shapes
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5, amz2023.linux.4xlarge.nvidia.gpu) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_data_dependent
pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 4, 5, amz2023.linux.4xlarge.nvidia.gpu) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5, amz2023.linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_distributed_patterns.py::DistributedPatternTests::test_fake_distributed_inductor
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, amz2023.linux.g5.4xlarge.nvidia.gpu) (gh)
dynamo/test_backward_higher_order_ops.py::BackwardHigherOrderOpTests::test_invoke_in_pt2_compiled_autograd_side_effect
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5, amz2023.linux.g5.4xlarge.nvidia.gpu) (gh)
dynamo/test_misc.py::MiscTests::test_numpy_no_raise
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5, amz2023.linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node
pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, amz2023.linux.g5.4xlarge.nvidia.gpu) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_no_raise_dynamic_shapes
pull / linux-focal-py3.11-clang10 / test (default, 1, 4, amz2023.linux.2xlarge) (gh)
inductor/test_distributed_patterns.py::DistributedPatternTests::test_intermediate_hook_with_closure
pull / linux-focal-py3.11-clang10 / test (default, 2, 4, amz2023.linux.2xlarge) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_no_raise_dynamic_shapes
pull / linux-focal-py3.11-clang10 / test (default, 3, 4, amz2023.linux.2xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_numpy_no_raise
pull / linux-focal-py3.11-clang10 / test (default, 4, 4, amz2023.linux.2xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node
pull / linux-focal-py3.12-clang10 / test (default, 1, 4, amz2023.linux.2xlarge) (gh)
inductor/test_distributed_patterns.py::DistributedPatternTests::test_intermediate_hook_with_closure
pull / linux-focal-py3.12-clang10 / test (default, 2, 4, amz2023.linux.2xlarge) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_no_raise_dynamic_shapes
pull / linux-focal-py3.12-clang10 / test (default, 3, 4, amz2023.linux.2xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_numpy_no_raise
pull / linux-focal-py3.12-clang10 / test (default, 4, 4, amz2023.linux.2xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 1, 3, amz2023.linux.2xlarge) (gh)
inductor/test_distributed_patterns.py::DistributedPatternTests::test_intermediate_hook_with_closure
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 2, 3, amz2023.linux.2xlarge) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_no_raise_dynamic_shapes
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 3, 3, amz2023.linux.2xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_numpy_no_raise
pull / linux-focal-py3.8-clang10 / test (default, 1, 4, amz2023.linux.2xlarge) (gh)
inductor/test_distributed_patterns.py::DistributedPatternTests::test_intermediate_hook_with_closure
pull / linux-focal-py3.8-clang10 / test (default, 2, 4, amz2023.linux.2xlarge) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_no_raise_dynamic_shapes
pull / linux-focal-py3.8-clang10 / test (default, 3, 4, amz2023.linux.2xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_numpy_no_raise
pull / linux-focal-py3.8-clang10 / test (default, 4, 4, amz2023.linux.2xlarge) (gh)
dynamo/test_hooks.py::HooksTests::test_complex_state_mutation_in_intermediary_hooks_same_on_inductor
pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6, amz2023.linux.4xlarge) (gh)
inductor/test_distributed_patterns.py::DistributedPatternTests::test_intermediate_hook_with_closure
pull / linux-jammy-py3.10-clang15-asan / test (default, 2, 6, amz2023.linux.4xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node
pull / linux-jammy-py3.10-clang15-asan / test (default, 3, 6, amz2023.linux.4xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_numpy_no_raise
pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6, amz2023.linux.4xlarge) (gh)
inductor/test_compiled_autograd.py::TestCompiledAutograd::test_autograd_cpp_node_data_dependent
pull / linux-jammy-py3.10-clang15-asan / test (default, 6, 6, amz2023.linux.4xlarge) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_no_raise_dynamic_shapes
pull / linux-jammy-py3.8-gcc11 / test (default, 1, 4, amz2023.linux.2xlarge) (gh)
inductor/test_distributed_patterns.py::DistributedPatternTests::test_intermediate_hook_with_closure
pull / linux-jammy-py3.8-gcc11 / test (default, 2, 4, amz2023.linux.2xlarge) (gh)
dynamo/test_dynamic_shapes.py::DynamicShapesMiscTests::test_numpy_no_raise_dynamic_shapes
pull / linux-jammy-py3.8-gcc11 / test (default, 3, 4, amz2023.linux.2xlarge) (gh)
dynamo/test_misc.py::MiscTests::test_numpy_no_raise

CANCELLED JOB - The following job was cancelled. Please retry:

Check Labels (gh)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / linux-jammy-cpu-py3.12-gcc11-inductor-halide / test (inductor-halide, 1, 1, linux.12xlarge) (gh) (trunk failure)
inductor/test_halide.py::CpuHalideTests::test_scalar_output_cpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… instead of duck sizing ghstack-source-id: 1a6d256 Pull Request resolved: #133581

xmfan · 2024-08-15T22:22:00Z

Still don't fully understand the change, changing the fakification logic shouldn't do anything:

Compiled autograd's logic for marking things as dynamic is all in here: https://github.com/pytorch/pytorch/blob/main/torch/csrc/dynamo/python_compiled_autograd.cpp#L210. We don't use these fake tensors to determine dynamism, only to bind with proxies and trace ops into the graph.
After tracing, we call torch.compile using the graph module and the real inputs. The fake tensors created by compiled autograd are just thrown away. Dynamo logic then dictates what should be marked dynamic.

ezyang · 2024-08-18T02:33:51Z

@bdhirsh I have actually completely forgotten what exactly we talked about in our 1:1, so you had better it write it down here :P

xmfan · 2024-08-21T18:04:55Z

The compiled autograd failures are both due to the GmWrapper inputs not being unflattened properly in the previous PR. Tests passed for me if you do something like out = PropagateUnbackedSymInts(mod_).run(*mod.unflatten_fn(args))

If the traced graph is always the same between tracing with static FakeTensor vs dynamic ones (is this is true?), then this approach should be okay

github-actions · 2024-11-05T05:35:35Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

compiled autograd: default fakeify backward inputs with static shapes…

64d034e

… instead of duck sizing [ghstack-poisoned]

This was referenced Aug 15, 2024

preserve node stacktraces from compiled autograd through AOTDispatcher, due to GmWrapper #133574

Closed

compiled autograd: support accumulate_grad_ in DTensor sharding #133580

Closed

pytorch-bot bot added ciflow/inductor module: dynamo labels Aug 15, 2024

bdhirsh added a commit that referenced this pull request Aug 15, 2024

compiled autograd: default fakeify backward inputs with static shapes…

4da7916

… instead of duck sizing ghstack-source-id: 1a6d256 Pull Request resolved: #133581

github-actions bot requested review from SherlockNoMad, albanD, antoniojkim, ezyang and miladm August 15, 2024 16:30

bdhirsh mentioned this pull request Aug 15, 2024

"unhashable type: non-nested SymInt" error when using DTensor and Compiled Autograd together #127797

Closed

albanD removed their request for review August 21, 2024 21:38

yf225 added module: dynamic shapes module: compiled autograd compiled_autograd labels Sep 6, 2024

github-actions bot added the Stale label Nov 5, 2024

github-actions bot closed this Dec 5, 2024

github-actions bot deleted the gh/bdhirsh/607/head branch January 5, 2025 02:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compiled autograd: default fakeify backward inputs with static shapes instead of duck sizing#133581

compiled autograd: default fakeify backward inputs with static shapes instead of duck sizing#133581
bdhirsh wants to merge 1 commit intogh/bdhirsh/607/basefrom
gh/bdhirsh/607/head

bdhirsh commented Aug 15, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Aug 15, 2024 •

edited

Loading

Uh oh!

xmfan commented Aug 15, 2024

Uh oh!

ezyang commented Aug 18, 2024

Uh oh!

xmfan commented Aug 21, 2024

Uh oh!

github-actions bot commented Nov 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bdhirsh commented Aug 15, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133581

❌ 40 New Failures, 1 Cancelled Job, 1 Unrelated Failure

Uh oh!

xmfan commented Aug 15, 2024

Uh oh!

ezyang commented Aug 18, 2024

Uh oh!

xmfan commented Aug 21, 2024

Uh oh!

github-actions bot commented Nov 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bdhirsh commented Aug 15, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Aug 15, 2024 •

edited

Loading