Skip to content

run single-threaded gradgradcheck in test_nn#40999

Closed
ngimel wants to merge 1 commit intopytorch:masterfrom
ngimel:speed_tests1
Closed

run single-threaded gradgradcheck in test_nn#40999
ngimel wants to merge 1 commit intopytorch:masterfrom
ngimel:speed_tests1

Conversation

@ngimel
Copy link
Copy Markdown
Collaborator

@ngimel ngimel commented Jul 6, 2020

Most time-consuming tests in test_nn (taking about half the time) were gradgradchecks on Conv3d. Reduce their sizes, and, most importantly, run gradgradcheck single-threaded, because that cuts the time of conv3d tests by an order of magnitude, and barely affects other tests.
These changes bring test_nn time down from 1200 s to ~550 s on my machine.

@ngimel ngimel requested review from albanD and mruberry July 6, 2020 04:20
Copy link
Copy Markdown
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Do we want to open a task to start running tests using multiprocessing if we run them in a single thread?

Comment thread torch/testing/_internal/common_nn.py
@ngimel
Copy link
Copy Markdown
Collaborator Author

ngimel commented Jul 6, 2020

Do we want to open a task to start running tests using multiprocessing if we run them in a single thread?
I don't think so,

  1. not all tests benefit or don't suffer from disabling multithreading
  2. some tests specifically request large input tensors to check correct multithreading implementation.
    @mruberry looked at using multiprocessing for tests, but it did not seem worth it.

@albanD
Copy link
Copy Markdown
Collaborator

albanD commented Jul 6, 2020

Sounds good! Thanks for the details.

Copy link
Copy Markdown
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel merged this pull request in dac63a1.

csarofeen pushed a commit to csarofeen/pytorch that referenced this pull request Jul 7, 2020
Summary:
Most time-consuming tests in test_nn (taking about half the time) were gradgradchecks on Conv3d. Reduce their sizes, and, most importantly, run gradgradcheck single-threaded, because that cuts the time of conv3d tests by an order of magnitude, and barely affects other tests.
These changes bring test_nn time down from 1200 s to ~550 s on my machine.

Pull Request resolved: pytorch#40999

Differential Revision: D22396896

Pulled By: ngimel

fbshipit-source-id: 3b247caceb65d64be54499de1a55de377fdf9506
@nairbv
Copy link
Copy Markdown
Collaborator

nairbv commented Jul 7, 2020

looks like this broke pytorch_linux_xenial_cuda11_0_cudnn8_py3_gcc7_test, will unland.

Comment on lines -2049 to +2051
constructor=lambda: nn.Conv3d(4, 6, kernel_size=3, groups=2),
constructor=lambda: nn.Conv3d(2, 4, kernel_size=3, groups=2),
cpp_constructor_args='torch::nn::Conv3dOptions(4, 6, 3).groups(2)',
input_size=(2, 4, 4, 5, 4),
input_size=(1, 2, 3, 3, 3),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zasdfgbnm , looks like this change causes CuDNN v8 to fail with:

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
Exception raised from operator() at /var/lib/jenkins/workspace/aten/src/ATen/native/cudnn/Conv.cpp:845 

see https://app.circleci.com/pipelines/github/pytorch/pytorch/188319/workflows/cf1b0d7c-66d5-4ba1-8d70-7d069999676f/jobs/6137541/tests

facebook-github-bot pushed a commit that referenced this pull request Jul 9, 2020
Summary:
Reland #40999

Pull Request resolved: #41147

Reviewed By: mruberry

Differential Revision: D22450357

Pulled By: ngimel

fbshipit-source-id: 02b6e020af5e6ef52542266bd9752b9cfbec4159
csarofeen added a commit to csarofeen/pytorch that referenced this pull request Aug 16, 2020
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Most time-consuming tests in test_nn (taking about half the time) were gradgradchecks on Conv3d. Reduce their sizes, and, most importantly, run gradgradcheck single-threaded, because that cuts the time of conv3d tests by an order of magnitude, and barely affects other tests.
These changes bring test_nn time down from 1200 s to ~550 s on my machine.

Pull Request resolved: pytorch#40999

Differential Revision: D22396896

Pulled By: ngimel

fbshipit-source-id: 3b247caceb65d64be54499de1a55de377fdf9506
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Reland pytorch#40999

Pull Request resolved: pytorch#41147

Reviewed By: mruberry

Differential Revision: D22450357

Pulled By: ngimel

fbshipit-source-id: 02b6e020af5e6ef52542266bd9752b9cfbec4159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants