Skip to content

[Bugfix] Fix nll bug via decomposition handling#177189

Closed
Lucaskabela wants to merge 1 commit intomainfrom
lucaskabela/nll_bugfix
Closed

[Bugfix] Fix nll bug via decomposition handling#177189
Lucaskabela wants to merge 1 commit intomainfrom
lucaskabela/nll_bugfix

Conversation

@Lucaskabela
Copy link
Copy Markdown
Contributor

@Lucaskabela Lucaskabela commented Mar 11, 2026

Fixes #89630

Summary

NLL backward was failing under compile with 1d input when self is 1D, where target.unsqueeze(0) produced
a 2D index for the subsequent 1D scatter. The fix mirrors the C++ kernel's behavior of only using
target[0]

Test

python -m pytest test/inductor/test_torchinductor.py -xvs -k "test_nll_loss_backward_1d_input"

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177189

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 9ae0c7d with merge base f249065 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 11, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@Lucaskabela Lucaskabela force-pushed the lucaskabela/nll_bugfix branch from b968311 to 9ae0c7d Compare March 11, 2026 21:40
@Lucaskabela Lucaskabela marked this pull request as ready for review March 11, 2026 21:40
@Lucaskabela
Copy link
Copy Markdown
Contributor Author

@claude are these test failures related to my change?

@claude
Copy link
Copy Markdown

claude bot commented Mar 11, 2026

Claude finished @Lucaskabela's task in 1m 13s —— View job


CI Failure Analysis

  • Review the PR changes to understand scope
  • Analyze each CI failure for relevance to the change
  • Provide summary

No, none of the CI failures are related to your change. Here's the breakdown:

1. linux-jammy-py3.14-clang15 / test (crossref, 1, 2)Not related

This job was killed with exit code 137 (SIGKILL), which is the OOM killer. The logs show it was running test_fx.py when the process was killed — a test file completely unrelated to NLL loss or decompositions. This is a resource/memory issue on the CI runner, not a test failure caused by your code.

2. test_dtensor_op_db_nanmean_cpu_float32Not related (flaky)

Already marked as flaky by the CI bot, with a similar failure on trunk. This is a distributed tensor test unrelated to NLL loss.

3. detectron2_maskrcnn_r_50_fpnNot related (unstable)

Marked as unstable and linked to existing issue #174929. This is an inductor benchmark flake, not related to your change.

Your change only touches torch/_decomp/decompositions.py (6 lines) and adds a test in test/inductor/test_torchinductor.py. None of the failures involve these files or the NLL loss code path.


@Lucaskabela Lucaskabela requested a review from azahed98 March 12, 2026 00:16
@Lucaskabela Lucaskabela requested a review from frgossen March 12, 2026 15:58
@Lucaskabela
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 12, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 12, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@Lucaskabela
Copy link
Copy Markdown
Contributor Author

@pytorchbot label "topic: not user facing"

@Lucaskabela
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged while ignoring the following 3 checks: pull / linux-jammy-py3.14-clang15 / test (crossref, 1, 2, lf.linux.2xlarge), pull / linux-jammy-py3.10-gcc11 / test (distributed, 1, 2, lf.linux.2xlarge), inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
Fixes pytorch#89630

## Summary
NLL backward was failing under compile with 1d input when self is 1D, where target.unsqueeze(0) produced
  a 2D index for the subsequent 1D scatter. The fix mirrors the C++ kernel's behavior of only using
  target[0]

## Test
```bash
python -m pytest test/inductor/test_torchinductor.py -xvs -k "test_nll_loss_backward_1d_input"
```

Pull Request resolved: pytorch#177189
Approved by: https://github.com/frgossen
@github-actions github-actions bot deleted the lucaskabela/nll_bugfix branch April 12, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[dynamo] RuntimeError: Failed running call_function aten.nll_loss_backward(*(FakeTensor(FakeTensor(...

3 participants