Add Python decomposition for quantile/nanquantile to fix torch.export by tugsbayasgalan · Pull Request #174787 · pytorch/pytorch

tugsbayasgalan · 2026-02-11T18:01:43Z

Stack from ghstack (oldest at bottom):

-> Add Python decomposition for quantile/nanquantile to fix torch.export #174787

The C++ CompositeImplicitAutograd kernel for quantile calls
at::is_scalar_tensor_true (which uses at::equal, tagged
data_dependent_output) for input validation. During torch.export,
FakeTensor tracing cannot evaluate data-dependent ops, so export
fails with DataDependentOutputException.

Register Python decompositions with py_impl(CompositeImplicitAutograd)
so the Python implementation runs instead of the C++ kernel during
tracing. The Python version uses only sort, gather, lerp, and other
ops that work with FakeTensors.

Authored with Claude.

[ghstack-poisoned]

pytorch-bot · 2026-02-11T18:01:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174787

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 32 New Failures, 7 Unrelated Failures

As of commit 9e9bae6 with merge base f5fbedb ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner-noclang-all / linux-job (gh)
RuntimeError: Command docker exec -t f4faae9a670ca9ecaad13c51f14c25b9de2c37c66aefe6e44e012cb5c6fe7841 /exec failed with exit code 1
pull / linux-jammy-py3.10-clang15 / test (crossref, 1, 2, linux.2xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_nanquantile_cpu_float32
pull / linux-jammy-py3.10-clang15 / test (crossref, 2, 2, linux.2xlarge) (gh)
test/test_ops_unbacked.py::TestOpsUnbackedCPU::test_unbacked_op_db_nanquantile_cpu_float32
pull / linux-jammy-py3.10-clang15 / test (default, 1, 5, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_nanquantile_cpu_float32
pull / linux-jammy-py3.10-clang15 / test (default, 3, 5, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_quantile_cpu_float32
pull / linux-jammy-py3.10-clang15 / test (default, 4, 5, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_exhaustive_quantile_cpu_float32
pull / linux-jammy-py3.10-clang15 / test (default, 5, 5, linux.4xlarge) (gh)
test/test_ops_unbacked.py::TestOpsUnbackedCPU::test_unbacked_op_db_nanquantile_cpu_float32
pull / linux-jammy-py3.10-clang18-asan / test (default, 1, 7, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_exhaustive_nanquantile_cpu_float32
pull / linux-jammy-py3.10-clang18-asan / test (default, 2, 7, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_nanquantile_cpu_float32
pull / linux-jammy-py3.10-clang18-asan / test (default, 4, 7, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_symbolic_exhaustive_quantile_cpu_float32
pull / linux-jammy-py3.10-clang18-asan / test (default, 5, 7, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_symbolic_exhaustive_nanquantile_cpu_float32
pull / linux-jammy-py3.10-clang18-asan / test (default, 7, 7, linux.4xlarge) (gh)
test/test_ops_unbacked.py::TestOpsUnbackedCPU::test_unbacked_op_db_nanquantile_cpu_float32
pull / linux-jammy-py3.10-gcc11 / test (default, 2, 5, linux.2xlarge) (gh)
test/test_ops_unbacked.py::TestOpsUnbackedCPU::test_unbacked_op_db_nanquantile_cpu_float32
pull / linux-jammy-py3.10-gcc11 / test (default, 3, 5, linux.2xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_nanquantile_cpu_float32
pull / linux-jammy-py3.10-gcc11 / test (default, 4, 5, linux.2xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_quantile_cpu_float32
pull / linux-jammy-py3.10-gcc11 / test (default, 5, 5, linux.2xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_exhaustive_quantile_cpu_float32
pull / linux-jammy-py3.14-clang15 / test (crossref, 1, 2, linux.2xlarge) (gh)
test/test_ops_unbacked.py::TestOpsUnbackedCPU::test_unbacked_op_db_nanquantile_cpu_float32
pull / linux-jammy-py3.14-clang15 / test (crossref, 2, 2, linux.2xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_nanquantile_cpu_float32
pull / linux-jammy-py3.14-clang15 / test (default, 1, 5, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_nanquantile_cpu_float32
pull / linux-jammy-py3.14-clang15 / test (default, 2, 5, linux.4xlarge) (gh)
test/test_ops_unbacked.py::TestOpsUnbackedCPU::test_unbacked_op_db_nanquantile_cpu_float32
pull / linux-jammy-py3.14-clang15 / test (default, 3, 5, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_quantile_cpu_float32
pull / linux-jammy-py3.14-clang15 / test (default, 5, 5, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_exhaustive_quantile_cpu_float32
pull / linux-jammy-py3.14t-clang15 / test (crossref, 1, 2, linux.2xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_quantile_cpu_float32
pull / linux-jammy-py3.14t-clang15 / test (crossref, 2, 2, linux.2xlarge) (gh)
test/test_ops_unbacked.py::TestOpsUnbackedCPU::test_unbacked_op_db_nanquantile_cpu_float32
pull / linux-jammy-py3.14t-clang15 / test (default, 1, 5, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_symbolic_exhaustive_quantile_cpu_float32
pull / linux-jammy-py3.14t-clang15 / test (default, 2, 5, linux.4xlarge) (gh)
test/test_ops_unbacked.py::TestOpsUnbackedCPU::test_unbacked_op_db_nanquantile_cpu_float32
pull / linux-jammy-py3.14t-clang15 / test (default, 3, 5, linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_disable_functionalization_exhaustive_nanquantile_cpu_float32
trunk / macos-py3-arm64 / build (gh)
trunk / win-vs2022-cpu-py3 / test (default, 1, 4, windows.4xlarge.nonephemeral) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_exhaustive_nanquantile_cpu_float32
trunk / win-vs2022-cpu-py3 / test (default, 2, 4, windows.4xlarge.nonephemeral) (gh)
test/test_ops_unbacked.py::TestOpsUnbackedCPU::test_unbacked_op_db_nanquantile_cpu_float32
trunk / win-vs2022-cpu-py3 / test (default, 3, 4, windows.4xlarge.nonephemeral) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_symbolic_exhaustive_quantile_cpu_float32
trunk / win-vs2022-cpu-py3 / test (default, 4, 4, windows.4xlarge.nonephemeral) (gh)
test/functorch/test_aotdispatch.py::TestEagerFusionOpInfoCPU::test_aot_autograd_exhaustive_quantile_cpu_float32

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / unit-test / inductor-test / test (inductor_distributed, 1, 1, linux.g5.12xlarge.nvidia.gpu) (gh) (disabled by #146806 but the issue was closed recently and a rebase is needed to make it pass)
test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_coalesced
trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (distributed, 1, 3, linux.g4dn.12xlarge.nvidia.gpu) (gh) (disabled by #146806 but the issue was closed recently and a rebase is needed to make it pass)
test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_coalesced
trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (distributed, 1, 3, linux.g4dn.12xlarge.nvidia.gpu) (gh) (disabled by #146806 but the issue was closed recently and a rebase is needed to make it pass)
test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_coalesced
trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx950.1) (gh) (disabled by #173717 but the issue was closed recently and a rebase is needed to make it pass)
test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_preserves_metadata_cache_cuda_float32
trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx950.1) (gh) (disabled by #173620 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_mutated_autotuning_cuda
trunk / linux-jammy-rocm-py3.10 / test (default, 4, 6, linux.rocm.gpu.gfx950.1) (gh) (disabled by #173619 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_autotuning_cuda
trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 3, linux.rocm.gpu.gfx950.4) (gh) (disabled by #146806 but the issue was closed recently and a rebase is needed to make it pass)
test/distributed/test_c10d_functional_native.py::CompileTest::test_inductor_all_gather_into_tensor_coalesced

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 7c2aef2 Pull Request resolved: #174787

pytorch-bot · 2026-02-11T18:01:50Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…orch.export" The C++ CompositeImplicitAutograd kernel for quantile calls at::is_scalar_tensor_true (which uses at::equal, tagged data_dependent_output) for input validation. During torch.export, FakeTensor tracing cannot evaluate data-dependent ops, so export fails with DataDependentOutputException. Register Python decompositions with py_impl(CompositeImplicitAutograd) so the Python implementation runs instead of the C++ kernel during tracing. The Python version uses only sort, gather, lerp, and other ops that work with FakeTensors. Authored with Claude. [ghstack-poisoned]

ghstack-source-id: 8c9f850 Pull Request resolved: #174787

tugsbayasgalan · 2026-02-12T19:48:11Z

@pytorchbot merge -i

pytorchmergebot · 2026-02-12T19:50:26Z

Merge started

Your change will be merged while ignoring the following 1 checks: inductor / unit-test / inductor-test / test (inductor_distributed, 1, 1, linux.g5.12xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-02-12T19:55:55Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner-noclang-all / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

tugsbayasgalan · 2026-02-13T18:17:38Z

@pytorchbot merge -f "Lintrunner error is unrelated"

pytorchmergebot · 2026-02-13T18:19:33Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

weifengpy · 2026-02-14T15:06:06Z

change test as well? otherwise ci fails

This PR added a Python decomposition for quantile/nanquantile and removed some xfail markers (the test_make_fx_* and
   test_proxy_tensor ones), but it didn't remove the xfail markers for:                                               
  - test_ops_unbacked::test_unbacked_op_db_nanquantile_cpu_float32                                                  
  - test_aotdispatch::test_aot_autograd_disable_functionalization_exhaustive_nanquantile_cpu_float32
  - test_aotdispatch::test_aot_autograd_disable_functionalization_exhaustive_quantile_cpu_float32

seemethere · 2026-02-14T18:50:15Z

@pytorchbot revert -c nosignal -m "Looks like this is causing failures upstream see https://hud.pytorch.org/pytorch/pytorch/commit/4504c3dcee3c02886fae3340a8ee268717c6cb32"

pytorchmergebot · 2026-02-14T18:52:10Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…h.export (#174787)" This reverts commit 4504c3d. Reverted #174787 on behalf of https://github.com/seemethere due to Looks like this is causing failures upstream see https://hud.pytorch.org/pytorch/pytorch/commit/4504c3dcee3c02886fae3340a8ee268717c6cb32 ([comment](#174787 (comment)))

pytorchmergebot · 2026-02-14T18:52:18Z

@tugsbayasgalan your PR has been successfully reverted.

github-actions · 2026-04-15T23:47:13Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Fix quantile meta impl

e3641e8

[ghstack-poisoned]

pytorch-bot Bot added the ciflow/inductor label Feb 11, 2026

tugsbayasgalan added a commit that referenced this pull request Feb 11, 2026

Fix quantile meta impl

bb85324

ghstack-source-id: 7c2aef2 Pull Request resolved: #174787

tugsbayasgalan changed the title ~~Fix quantile meta impl~~ Add Python decomposition for quantile/nanquantile to fix torch.export Feb 11, 2026

pianpwk approved these changes Feb 11, 2026

View reviewed changes

tugsbayasgalan added the topic: not user facing topic category label Feb 12, 2026

tugsbayasgalan requested a review from mruberry as a code owner February 12, 2026 16:46

tugsbayasgalan added a commit that referenced this pull request Feb 12, 2026

Fix quantile meta impl

018259d

ghstack-source-id: 8c9f850 Pull Request resolved: #174787

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 12, 2026

pytorchmergebot added the merging label Feb 12, 2026

pytorchmergebot removed the merging label Feb 12, 2026

pytorchmergebot added the merging label Feb 13, 2026

pytorchmergebot closed this in 4504c3d Feb 13, 2026

pytorchmergebot added Merged and removed merging labels Feb 13, 2026

weifengpy mentioned this pull request Feb 14, 2026

[DTensor] enable single dim strategy for mm and bmm #172385

Closed

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Feb 14, 2026

pytorchmergebot reopened this Feb 14, 2026

github-actions Bot added the Stale label Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Python decomposition for quantile/nanquantile to fix torch.export#174787

Add Python decomposition for quantile/nanquantile to fix torch.export#174787
tugsbayasgalan wants to merge 2 commits intogh/tugsbayasgalan/120/basefrom
gh/tugsbayasgalan/120/head

tugsbayasgalan commented Feb 11, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 11, 2026

Uh oh!

tugsbayasgalan commented Feb 12, 2026

Uh oh!

pytorchmergebot commented Feb 12, 2026

Uh oh!

pytorchmergebot commented Feb 12, 2026

Uh oh!

tugsbayasgalan commented Feb 13, 2026

Uh oh!

pytorchmergebot commented Feb 13, 2026

Uh oh!

weifengpy commented Feb 14, 2026

Uh oh!

seemethere commented Feb 14, 2026

Uh oh!

pytorchmergebot commented Feb 14, 2026

Uh oh!

pytorchmergebot commented Feb 14, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

tugsbayasgalan commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/174787

❌ 32 New Failures, 7 Unrelated Failures

Uh oh!

pytorch-bot Bot commented Feb 11, 2026

This PR needs a release notes: label

Uh oh!

tugsbayasgalan commented Feb 12, 2026

Uh oh!

pytorchmergebot commented Feb 12, 2026

Merge started

Uh oh!

pytorchmergebot commented Feb 12, 2026

Merge failed

Uh oh!

tugsbayasgalan commented Feb 13, 2026

Uh oh!

pytorchmergebot commented Feb 13, 2026

Merge started

Uh oh!

weifengpy commented Feb 14, 2026

Uh oh!

seemethere commented Feb 14, 2026

Uh oh!

pytorchmergebot commented Feb 14, 2026

Uh oh!

pytorchmergebot commented Feb 14, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tugsbayasgalan commented Feb 11, 2026 •

edited

Loading

pytorch-bot Bot commented Feb 11, 2026 •

edited

Loading

This PR needs a `release notes:` label