add `OpInfo` for `torch.nn.functional.nll_loss` by pmeier · Pull Request #63854 · pytorch/pytorch

pmeier · 2021-08-24T09:53:41Z

Addresses pytorch/functorch#78.

cc @albanD @mruberry @jbschlosser @VitalyFedyunin @walterddr

facebook-github-bot · 2021-08-24T09:53:46Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/63854
📄 Preview docs built from this PR

💊 CI failures summary and remediations

As of commit 93bd050 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

pmeier · 2021-08-24T13:49:08Z

Failures on CUDA look real:

torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor(-0.4172, device='cuda:0')
analytical:tensor(-0.3553, device='cuda:0')

The above quantities relating the numerical and analytical jacobians are computed 
in fast mode. See: https://github.com/pytorch/pytorch/issues/53876 for more background 
about fast mode. Below, we recompute numerical and analytical jacobians in slow mode:

Numerical:
 tensor([[ 0.0000],
        [ 0.0000],
        [-0.5000],
        [ 0.0000],
        [ 0.0000],
        [-0.5000]], device='cuda:0')
Analytical:
tensor([[ 0.0000],
        [ 0.0000],
        [-0.5000],
        [ 0.0000],
        [ 0.0000],
        [-0.5000]], device='cuda:0')

The max per-element difference (slow mode) is: 3.62396240234375e-05.
Fast gradcheck failed but element-wise differences are small. This means that the
test might've passed in slow_mode!

Failures also happen in slow mode, i.e. setting gradcheck_fast_mode=False.

zou3519 · 2021-08-24T19:21:38Z

@pmeier what does the error look like with gradcheck_fast_mode=False?

pmeier · 2021-08-24T21:43:59Z

The same as the middle part of the message above:

torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[ 0.0000],
        [ 0.0000],
        [-0.4768],
        [ 0.0000],
        [ 0.0000],
        [-0.5364]], device='cuda:0')
analytical:tensor([[ 0.0000],
        [ 0.0000],
        [-0.5000],
        [ 0.0000],
        [ 0.0000],
        [-0.5000]], device='cuda:0')

Although here the differences are visible before the 5th decimal place.

pmeier · 2021-08-25T09:53:46Z

I think the failures are due to non-determinism that stems from reducing the output to a scalar. Setting reduction="none" passes the tests without modification.

pmeier · 2021-08-26T09:34:49Z

torch/testing/_internal/common_methods_invocations.py

+        # (shape_2d, dict()),
+        # ((*shape_2d, 3, 3), dict()),
+        # (shape_2d, dict(weight=True)),
+        # (shape_2d, dict(ignore_index=1)),
+        # (shape_2d, dict(reduction="mean")),
+        # (shape_2d, dict(reduction="sum")),


Enabling any of these sample inputs leads to gradcheck failures. They all have in common that reduction="mean" is the default value and thus a reduction is performed. reduction="none" uses a different code path and works fine. cc @albanD

codecov · 2021-08-26T12:40:28Z

Codecov Report

Merging #63854 (93bd050) into master (b1154cc) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #63854      +/-   ##
==========================================
- Coverage   66.85%   66.84%   -0.01%     
==========================================
  Files         695      695              
  Lines       90722    90748      +26     
==========================================
+ Hits        60649    60664      +15     
- Misses      30073    30084      +11

pmeier · 2021-08-30T18:36:01Z

Closed in favor of #64203.

Summary: Fixes #64163 This PR includes the fix and the opinfo from #63854 for non-regression testing. cc albanD mruberry jbschlosser Pull Request resolved: #64203 Reviewed By: albanD Differential Revision: D30647522 Pulled By: jbschlosser fbshipit-source-id: 2974d299763505908fa93532aca2bd5d5b71f2e9

add OpInfo for torch.nn.functional.nll_loss

7ef8d40

pmeier added module: nn Related to torch.nn module: tests Issues related to tests (not the torch.testing module) labels Aug 24, 2021

pmeier requested a review from zou3519 August 24, 2021 09:53

facebook-github-bot added the cla signed label Aug 24, 2021

pytorchbot added the open source label Aug 24, 2021

fix mypy

5046315

zou3519 mentioned this pull request Aug 24, 2021

Top 25 OpInfos for functorch pytorch/functorch#78

Closed

23 tasks

heitorschueroff added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 24, 2021

pmeier mentioned this pull request Aug 25, 2021

OpInfo for nn.functional.cross_entropy #63547

Closed

2 tasks

pmeier added 2 commits August 26, 2021 11:32

tmp disable failing sample inputs

597141e

Merge branch 'master' into opinfo/nll_loss

6539e98

pmeier commented Aug 26, 2021

View reviewed changes

cleanup

93bd050

zou3519 mentioned this pull request Aug 26, 2021

OpInfo Tracker #54261

Closed

pmeier mentioned this pull request Aug 30, 2021

Gradient is incorrect for torch.nn.functional.nll_loss for CUDA #64163

Closed

thomasjpfan mentioned this pull request Aug 30, 2021

BUG Fixes regression for nllloss gradcheck #64203

Closed

pmeier closed this Aug 30, 2021

pmeier deleted the opinfo/nll_loss branch August 30, 2021 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `OpInfo` for `torch.nn.functional.nll_loss`#63854

add `OpInfo` for `torch.nn.functional.nll_loss`#63854
pmeier wants to merge 5 commits intopytorch:masterfrom
pmeier:opinfo/nll_loss

pmeier commented Aug 24, 2021 •

edited by pytorch-probot bot

Loading

Uh oh!

facebook-github-bot commented Aug 24, 2021 •

edited

Loading

Uh oh!

pmeier commented Aug 24, 2021

Uh oh!

zou3519 commented Aug 24, 2021

Uh oh!

pmeier commented Aug 24, 2021 •

edited

Loading

Uh oh!

pmeier commented Aug 25, 2021

Uh oh!

pmeier Aug 26, 2021

Uh oh!

codecov bot commented Aug 26, 2021

Uh oh!

pmeier commented Aug 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pmeier commented Aug 24, 2021 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Aug 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

pmeier commented Aug 24, 2021

Uh oh!

zou3519 commented Aug 24, 2021

Uh oh!

pmeier commented Aug 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier commented Aug 25, 2021

Uh oh!

pmeier Aug 26, 2021

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 26, 2021

Codecov Report

Uh oh!

pmeier commented Aug 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pmeier commented Aug 24, 2021 •

edited by pytorch-probot bot

Loading

facebook-github-bot commented Aug 24, 2021 •

edited

Loading

pmeier commented Aug 24, 2021 •

edited

Loading