[inductor] Add TMA support for lazy Triton kernel compilation by desertfire · Pull Request #175548 · pytorch/pytorch

desertfire · 2026-02-23T17:21:03Z

Stack from ghstack (oldest at bottom):

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the
lazy compile path. The generated C++ wrapper receives both the TMA
descriptor and the underlying tensor as parameters. On the first call,
the tensor is passed to Python where _wrap_tma_args reconstructs
TensorDescriptor.from_tensor() for Triton's autotuner. On cached
launches, the StableTMADescriptor fields are unpacked directly into
kernel launch args. Scratch space is now allocated dynamically
at runtime using sizes from the autotuning result.

Authored with Claude.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. [ghstack-poisoned]

pytorch-bot · 2026-02-23T17:21:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175548

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit d306ef2 with merge base b180c2f ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float32

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
detectron2_maskrcnn_r_50_fpn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: 8b1e0a7 Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: eb027b8 Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: bef5145 Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: 32b8558 Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: ab2a6eb Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: be73cda Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: 006c684 Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: ecafe2f Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: 036c080 Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: fcd9ed8 Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. ghstack-source-id: 3491f11 Pull Request resolved: #175548

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]

PaulZhang12 · 2026-03-12T02:45:24Z

+                    sig_type = signature.get(key, "")
+                    if isinstance(sig_type, str) and signature_is_tma_desc(sig_type):
+                        if isinstance(
+                            raw_arg, (TMADescriptorExperimental, TMADescriptorStable)


Potentially brittle?

I will raise an AssertionError in the else branch.

…ion" Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo Differential Revision: [D96125146](https://our.internmc.facebook.com/intern/diff/D96125146) [ghstack-poisoned]

desertfire · 2026-03-13T13:04:40Z

@pytorchbot merge

pytorchmergebot · 2026-03-13T13:06:53Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team

Raised by workflow job

desertfire · 2026-03-13T17:10:00Z

@desertfire has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

desertfire · 2026-03-13T17:13:06Z

@pytorchbot merge

pytorchmergebot · 2026-03-13T17:15:26Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team

Raised by workflow job

desertfire · 2026-03-13T18:58:08Z

@pytorchbot merge

pytorchmergebot · 2026-03-13T19:00:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…#177306) Remove most cpp_wrapper skips from test_torchinductor.py since they can pass now. For some tests, change their skips to be conditioned on autotune_at_compile_time instead of cpp_wrapper. Fix `run_and_get_kernels` to extract kernel code using `R"TRITON(...)"` pattern for lazy compile cpp_wrapper mode, since kernels are embedded in C++ raw strings rather than Python triple-quoted strings. The remaining skips require more feature parity work to match cpp_wrapper with python_wrapper. Authored with Claude. Pull Request resolved: #177306 Approved by: https://github.com/PaulZhang12 ghstack dependencies: #175548

Add `aten._grouped_mm.default` to the AOTI fallback ops list so that a c-shim is generated, enabling cpp_wrapper mode for grouped_mm. Authored with Claude. Pull Request resolved: #177307 Approved by: https://github.com/yushangdi ghstack dependencies: #175548, #177306

…h#175548) Summary: Host-side TMA descriptors (StableTMADescriptor) are now handled in the lazy compile path. The generated C++ wrapper receives both the TMA descriptor and the underlying tensor as parameters. On the first call, the tensor is passed to Python where _wrap_tma_args reconstructs TensorDescriptor.from_tensor() for Triton's autotuner. On cached launches, the StableTMADescriptor fields are unpacked directly into kernel launch args. Scratch space is now allocated dynamically at runtime using sizes from the autotuning result. Authored with Claude. Pull Request resolved: pytorch#175548 Approved by: https://github.com/PaulZhang12

…pytorch#177306) Remove most cpp_wrapper skips from test_torchinductor.py since they can pass now. For some tests, change their skips to be conditioned on autotune_at_compile_time instead of cpp_wrapper. Fix `run_and_get_kernels` to extract kernel code using `R"TRITON(...)"` pattern for lazy compile cpp_wrapper mode, since kernels are embedded in C++ raw strings rather than Python triple-quoted strings. The remaining skips require more feature parity work to match cpp_wrapper with python_wrapper. Authored with Claude. Pull Request resolved: pytorch#177306 Approved by: https://github.com/PaulZhang12 ghstack dependencies: pytorch#175548

Add `aten._grouped_mm.default` to the AOTI fallback ops list so that a c-shim is generated, enabling cpp_wrapper mode for grouped_mm. Authored with Claude. Pull Request resolved: pytorch#177307 Approved by: https://github.com/yushangdi ghstack dependencies: pytorch#175548, pytorch#177306

…pytorch#177306) Remove most cpp_wrapper skips from test_torchinductor.py since they can pass now. For some tests, change their skips to be conditioned on autotune_at_compile_time instead of cpp_wrapper. Fix `run_and_get_kernels` to extract kernel code using `R"TRITON(...)"` pattern for lazy compile cpp_wrapper mode, since kernels are embedded in C++ raw strings rather than Python triple-quoted strings. The remaining skips require more feature parity work to match cpp_wrapper with python_wrapper. Authored with Claude. Pull Request resolved: pytorch#177306 Approved by: https://github.com/PaulZhang12 ghstack dependencies: pytorch#175548

Add `aten._grouped_mm.default` to the AOTI fallback ops list so that a c-shim is generated, enabling cpp_wrapper mode for grouped_mm. Authored with Claude. Pull Request resolved: pytorch#177307 Approved by: https://github.com/yushangdi ghstack dependencies: pytorch#175548, pytorch#177306

This was referenced Feb 20, 2026

[inductor][refactor] Update DeferredTritonCallWrapper.generate #175414

Closed

[inductor][refactor] Update cpp_wrapper_src quote format #175415

Closed

desertfire mentioned this pull request Feb 20, 2026

[inductor] Add lazy Triton kernel compilation for cpp-wrapper #175416

Closed

pytorch-bot Bot added ciflow/inductor module: inductor release notes: inductor (aoti) labels Feb 23, 2026

PaulZhang12 reviewed Mar 12, 2026

View reviewed changes

PaulZhang12 approved these changes Mar 12, 2026

View reviewed changes

This was referenced Mar 12, 2026

[inductor] Remove more skip_if_cpp_wrapper from test_torchinductor.py #177306

Closed

[inductor] Add _grouped_mm to AOTI fallback ops #177307

Closed

pytorchmergebot added the merging label Mar 13, 2026

pytorchmergebot removed the merging label Mar 13, 2026

pytorchmergebot added the merging label Mar 13, 2026

pytorchmergebot removed the merging label Mar 13, 2026

pytorchmergebot added the merging label Mar 13, 2026

pytorchmergebot closed this in cb8f305 Mar 13, 2026

pytorchmergebot added Merged and removed merging labels Mar 13, 2026

github-actions Bot deleted the gh/desertfire/654/head branch April 13, 2026 02:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] Add TMA support for lazy Triton kernel compilation#175548

[inductor] Add TMA support for lazy Triton kernel compilation#175548
desertfire wants to merge 15 commits intogh/desertfire/654/basefrom
gh/desertfire/654/head

desertfire commented Feb 23, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 23, 2026 •

edited

Loading

Uh oh!

PaulZhang12 Mar 12, 2026

Uh oh!

desertfire Mar 12, 2026

Uh oh!

desertfire commented Mar 13, 2026

Uh oh!

pytorchmergebot commented Mar 13, 2026

Uh oh!

desertfire commented Mar 13, 2026

Uh oh!

desertfire commented Mar 13, 2026

Uh oh!

pytorchmergebot commented Mar 13, 2026

Uh oh!

desertfire commented Mar 13, 2026

Uh oh!

pytorchmergebot commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

desertfire commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175548

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

PaulZhang12 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

desertfire Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

desertfire commented Mar 13, 2026

Uh oh!

pytorchmergebot commented Mar 13, 2026

Merge failed

Uh oh!

desertfire commented Mar 13, 2026

Uh oh!

desertfire commented Mar 13, 2026

Uh oh!

pytorchmergebot commented Mar 13, 2026

Merge failed

Uh oh!

desertfire commented Mar 13, 2026

Uh oh!

pytorchmergebot commented Mar 13, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

desertfire commented Feb 23, 2026 •

edited

Loading

pytorch-bot Bot commented Feb 23, 2026 •

edited

Loading