Migrate `var` & `std` to ATen by ShawnZhong · Pull Request #39967 · pytorch/pytorch

ShawnZhong · 2020-06-12T23:05:30Z

Not sure why there are so many issues for std & var, but this PR should close them all:
std: Fix #24771, Fix #24676, Fix #24639, Fix #24529
var: Fix #24782, Fix #24677, Fix #24652, Fix #24530

import time
import torch

def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

for device in (torch.device("cpu"), torch.device("cuda")):
    for size in (
        [100000000],
        [10000, 10000],
        [1000, 1000, 100],
        [100, 100, 100, 100],
    ):
        t = torch.randn(*size, device=device)
        total_time = 0
        for i in range(10):
            t1 = _time()
            t.std()
            t2 = _time()
            total_time += t2 - t1
        print(f"Tensor of size {size} on {device}: {total_time / 10}")

Before:

Tensor of size [100000000] on cpu: 0.36041643619537356
Tensor of size [10000, 10000] on cpu: 0.37235140800476074
Tensor of size [1000, 1000, 100] on cpu: 0.386572527885437
Tensor of size [100, 100, 100, 100] on cpu: 0.37404844760894773
Tensor of size [100000000] on cuda: 0.0021645784378051757
Tensor of size [10000, 10000] on cuda: 0.002090191841125488
Tensor of size [1000, 1000, 100] on cuda: 0.00208127498626709
Tensor of size [100, 100, 100, 100] on cuda: 0.0020844221115112306

After:

Tensor of size [100000000] on cpu: 0.1339871883392334
Tensor of size [10000, 10000] on cpu: 0.1343991994857788
Tensor of size [1000, 1000, 100] on cpu: 0.1346735954284668
Tensor of size [100, 100, 100, 100] on cpu: 0.11906447410583496
Tensor of size [100000000] on cuda: 0.0013531208038330077
Tensor of size [10000, 10000] on cuda: 0.0012922048568725585
Tensor of size [1000, 1000, 100] on cuda: 0.001285886764526367
Tensor of size [100, 100, 100, 100] on cuda: 0.0012899160385131836

cc: @VitalyFedyunin

dr-ci · 2020-06-12T23:11:24Z

💊 CI failures summary and remediations

As of commit 552f160 (more details on the Dr. CI page):

4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda10.1_test1 (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_nn failed!

  File "C:\Jenkins\Miniconda3\lib\site-packages\scipy\_distributor_init.py", line 61, in <module> 
    WinDLL(os.path.abspath(filename)) 
  File "C:\Jenkins\Miniconda3\lib\ctypes\__init__.py", line 348, in __init__ 
    self._handle = _dlopen(self._name, mode) 
OSError: [WinError 126] The specified module could not be found 
Traceback (most recent call last): 
  File "run_test.py", line 726, in <module> 
    main() 
  File "run_test.py", line 719, in main 
    raise RuntimeError(message) 
RuntimeError: test_nn failed! 
 
(base) circleci@PACKER-5ECD3242 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

pytorch_windows_vs2019_py36_cuda10.1_test2 (2/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_cuda failed!

  File "C:\Jenkins\Miniconda3\lib\site-packages\scipy\_distributor_init.py", line 61, in <module> 
    WinDLL(os.path.abspath(filename)) 
  File "C:\Jenkins\Miniconda3\lib\ctypes\__init__.py", line 348, in __init__ 
    self._handle = _dlopen(self._name, mode) 
OSError: [WinError 126] The specified module could not be found 
Traceback (most recent call last): 
  File "run_test.py", line 726, in <module> 
    main() 
  File "run_test.py", line 719, in main 
    raise RuntimeError(message) 
RuntimeError: test_cuda failed! 
 
(base) circleci@PACKER-5EE89583 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

pytorch_windows_vs2019_py36_cuda10.1_on_cpu_test1 (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_nn failed!

  File "C:\Jenkins\Miniconda3\lib\site-packages\scipy\_distributor_init.py", line 61, in <module> 
    WinDLL(os.path.abspath(filename)) 
  File "C:\Jenkins\Miniconda3\lib\ctypes\__init__.py", line 348, in __init__ 
    self._handle = _dlopen(self._name, mode) 
OSError: [WinError 126] The specified module could not be found 
Traceback (most recent call last): 
  File "run_test.py", line 726, in <module> 
    main() 
  File "run_test.py", line 719, in main 
    raise RuntimeError(message) 
RuntimeError: test_nn failed! 
 
(base) circleci@PACKER-5EE89590 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

pytorch_windows_vs2019_py36_cpu_test1 (4/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_nn failed!

  File "C:\Jenkins\Miniconda3\lib\site-packages\scipy\_distributor_init.py", line 61, in <module> 
    WinDLL(os.path.abspath(filename)) 
  File "C:\Jenkins\Miniconda3\lib\ctypes\__init__.py", line 348, in __init__ 
    self._handle = _dlopen(self._name, mode) 
OSError: [WinError 126] The specified module could not be found 
Traceback (most recent call last): 
  File "run_test.py", line 726, in <module> 
    main() 
  File "run_test.py", line 719, in main 
    raise RuntimeError(message) 
RuntimeError: test_nn failed! 
 
(base) circleci@PACKER-5EE89590 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1  
+ cleanup
+ retcode=1
+ set +x

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 31 times.

VitalyFedyunin

Please rebase

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

robieta · 2020-06-23T21:28:00Z

@ShawnZhong @VitalyFedyunin @ngimel
This PR significantly regresses single threaded CPU performance. (I also see speedups when multi-threading is enabled.) Results from Shawn's script with torch.set_num_threads(1) at the start:

c4fc278

Tensor of size [100000000] on cpu: 0.18599200248718262
Tensor of size [10000, 10000] on cpu: 0.17864339351654052
Tensor of size [1000, 1000, 100] on cpu: 0.1743138313293457
Tensor of size [100, 100, 100, 100] on cpu: 0.1824030637741089
Tensor of size [100000000] on cuda: 0.0017017841339111329
Tensor of size [10000, 10000] on cuda: 0.001610112190246582
Tensor of size [1000, 1000, 100] on cuda: 0.00161285400390625
Tensor of size [100, 100, 100, 100] on cuda: 0.0016181468963623047

7a3c223 (This PR)

Tensor of size [100000000] on cpu: 0.8823971509933471
Tensor of size [10000, 10000] on cpu: 0.8826733112335206
Tensor of size [1000, 1000, 100] on cpu: 0.8823000669479371
Tensor of size [100, 100, 100, 100] on cpu: 0.882302713394165
Tensor of size [100000000] on cuda: 0.0011995553970336914
Tensor of size [10000, 10000] on cuda: 0.001114821434020996
Tensor of size [1000, 1000, 100] on cuda: 0.0011154413223266602
Tensor of size [100, 100, 100, 100] on cuda: 0.001114058494567871

My testing on GPU agrees that this is generally an improvement, though there are some cases with regressions. (#38338 will soon be updated with the script that I used to benchmark this PR.)

Unfortunately, we may need to revert this PR since the impact on single threaded CPU speed is quite severe.

VitalyFedyunin · 2020-06-23T23:51:16Z

@robieta sounds reasonable, let me revert it first, and after we can or (if quick) fix single thread, or at least apply GPU part first.

Summary: Not sure why there are so many issues for std & var, but this PR should close them all: std: Fix pytorch#24771, Fix pytorch#24676, Fix pytorch#24639, Fix pytorch#24529 var: Fix pytorch#24782, Fix pytorch#24677, Fix pytorch#24652, Fix pytorch#24530 ```py import time import torch def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() for device in (torch.device("cpu"), torch.device("cuda")): for size in ( [100000000], [10000, 10000], [1000, 1000, 100], [100, 100, 100, 100], ): t = torch.randn(*size, device=device) total_time = 0 for i in range(10): t1 = _time() t.std() t2 = _time() total_time += t2 - t1 print(f"Tensor of size {size} on {device}: {total_time / 10}") ``` Before: ``` Tensor of size [100000000] on cpu: 0.36041643619537356 Tensor of size [10000, 10000] on cpu: 0.37235140800476074 Tensor of size [1000, 1000, 100] on cpu: 0.386572527885437 Tensor of size [100, 100, 100, 100] on cpu: 0.37404844760894773 Tensor of size [100000000] on cuda: 0.0021645784378051757 Tensor of size [10000, 10000] on cuda: 0.002090191841125488 Tensor of size [1000, 1000, 100] on cuda: 0.00208127498626709 Tensor of size [100, 100, 100, 100] on cuda: 0.0020844221115112306 ``` After: ``` Tensor of size [100000000] on cpu: 0.1339871883392334 Tensor of size [10000, 10000] on cpu: 0.1343991994857788 Tensor of size [1000, 1000, 100] on cpu: 0.1346735954284668 Tensor of size [100, 100, 100, 100] on cpu: 0.11906447410583496 Tensor of size [100000000] on cuda: 0.0013531208038330077 Tensor of size [10000, 10000] on cuda: 0.0012922048568725585 Tensor of size [1000, 1000, 100] on cuda: 0.001285886764526367 Tensor of size [100, 100, 100, 100] on cuda: 0.0012899160385131836 ``` cc: VitalyFedyunin Pull Request resolved: pytorch#39967 Differential Revision: D22162469 Pulled By: VitalyFedyunin fbshipit-source-id: 8d901c779767b00f81cd6231bc665e04f297b4c3

ShawnZhong force-pushed the std_var branch 2 times, most recently from eaacf05 to ac79193 Compare June 12, 2020 23:20

ShawnZhong changed the title ~~[WIP] Migrate var & std to ATen~~ [WIP][DO NOT REVIEW] Migrate var & std to ATen Jun 12, 2020

ShawnZhong force-pushed the std_var branch from beababc to 3c6d1a0 Compare June 13, 2020 02:23

ShawnZhong changed the title ~~[WIP][DO NOT REVIEW] Migrate var & std to ATen~~ Migrate var & std to ATen Jun 13, 2020

ShawnZhong marked this pull request as ready for review June 13, 2020 03:10

ShawnZhong marked this pull request as draft June 13, 2020 03:27

ShawnZhong force-pushed the std_var branch from f5a9219 to b1f420a Compare June 13, 2020 03:43

ShawnZhong marked this pull request as ready for review June 13, 2020 06:29

VitalyFedyunin self-requested a review June 22, 2020 00:27

VitalyFedyunin approved these changes Jun 22, 2020

View reviewed changes

ShawnZhong force-pushed the std_var branch from 8912a3f to 75317c0 Compare June 22, 2020 12:30

Migrate var & std to ATen

552f160

ShawnZhong force-pushed the std_var branch from 75317c0 to 552f160 Compare June 22, 2020 12:57

facebook-github-bot reviewed Jun 22, 2020

View reviewed changes

pytorchbot added the open source label Jun 22, 2020

facebook-github-bot closed this in 7a3c223 Jun 22, 2020

robieta mentioned this pull request Jun 24, 2020

Prototype benchmarking util #38338

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate `var` & `std` to ATen #39967

Migrate `var` & `std` to ATen #39967
ShawnZhong wants to merge 1 commit intopytorch:masterfrom
ShawnZhong:std_var

ShawnZhong commented Jun 12, 2020 •

edited

Loading

Uh oh!

dr-ci Bot commented Jun 12, 2020 •

edited

Loading

Uh oh!

VitalyFedyunin left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

robieta commented Jun 23, 2020

Uh oh!

VitalyFedyunin commented Jun 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ShawnZhong commented Jun 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci Bot commented Jun 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 4 new failures recognized by patterns

pytorch_windows_vs2019_py36_cuda10.1_test1 (1/4)

pytorch_windows_vs2019_py36_cuda10.1_test2 (2/4)

pytorch_windows_vs2019_py36_cuda10.1_on_cpu_test1 (3/4)

pytorch_windows_vs2019_py36_cpu_test1 (4/4)

Uh oh!

VitalyFedyunin left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

robieta commented Jun 23, 2020

Uh oh!

VitalyFedyunin commented Jun 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ShawnZhong commented Jun 12, 2020 •

edited

Loading

dr-ci Bot commented Jun 12, 2020 •

edited

Loading