[aten] Optimizing reshape by hlu1 · Pull Request #50859 · pytorch/pytorch

hlu1 · 2021-01-21T00:24:43Z

Summary: aten::view calls infer_size and computeStride again which are already done inside reshape. Skipping those two function calls and avoiding calling aten::view to avoid the dispatcher proves to be more efficient.

Test Plan:
Unit test:

buck test //caffe2/test:torch

Benchmark:

MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \
--scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/traced_precomputation.pt \
--pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/container_precomputation_bs20.pt \
--iters=10000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \
--pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile=true

Reduces the total time spent on reshape and flatten from 3.16% to 2.46% (net 0.7% reduction).

Before: PyTorch run finished. Milliseconds per iter: 0.0736055. Iters per second: 13585.9

    0.0013895 ms.    1.92361%. aten::reshape (2 nodes)
    0.000892179 ms.    1.23513%. aten::flatten (1 nodes)

After: PyTorch run finished. Milliseconds per iter: 0.0722102. Iters per second: 13848.5

    0.000978748 ms.    1.36668%. aten::reshape (2 nodes)
    0.000786076 ms.    1.09764%. aten::flatten (1 nodes)

Differential Revision: D25986759

facebook-github-bot · 2021-01-21T00:24:59Z

This pull request was exported from Phabricator. Differential Revision: D25986759

Summary: Pull Request resolved: pytorch#50859 Test Plan: Unit test: ``` buck test //caffe2/test:torch ``` Benchmark: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \ ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/traced_precomputation.pt \ --pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/container_precomputation_bs20.pt \ --iters=10000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \ --pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile=true ``` Reduces the total time spent on flatten from 1.22% to 0.97% (net 0.25% reduction). ``` Before: Static runtime ms per iter: 0.0725054. Iters per second: 13792.1 0.000857179 ms. 1.21862%. aten::flatten (1 nodes) After: Static runtime ms per iter: 0.0720371. Iters per second: 13881.7 0.000686155 ms. 0.97151%. aten::flatten (1 nodes) ``` Differential Revision: D25986759 fbshipit-source-id: afddbe57fd00e9a6c5f589b8ee9d7f1c06156374

facebook-github-bot · 2021-01-21T07:34:45Z

This pull request was exported from Phabricator. Differential Revision: D25986759

codecov · 2021-01-21T10:56:35Z

Codecov Report

Merging #50859 (50b8bdd) into master (439afda) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #50859      +/-   ##
==========================================
- Coverage   81.02%   81.02%   -0.01%     
==========================================
  Files        1916     1916              
  Lines      209285   209285              
==========================================
- Hits       169572   169564       -8     
- Misses      39713    39721       +8

facebook-github-bot · 2021-01-23T07:15:10Z

This pull request has been merged in 6aec1eb.

Summary: Pull Request resolved: pytorch#50859 Test Plan: Unit test: ``` buck test //caffe2/test:torch ``` Benchmark: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \ ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/traced_precomputation.pt \ --pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/container_precomputation_bs20.pt \ --iters=10000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \ --pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile=true ``` Reduces the total time spent on flatten from 1.22% to 0.97% (net 0.25% reduction). ``` Before: Static runtime ms per iter: 0.0725054. Iters per second: 13792.1 0.000857179 ms. 1.21862%. aten::flatten (1 nodes) After: Static runtime ms per iter: 0.0720371. Iters per second: 13881.7 0.000686155 ms. 0.97151%. aten::flatten (1 nodes) ``` Reviewed By: ajyu Differential Revision: D25986759 fbshipit-source-id: dc0f542c56a688d331d349845b78084577970476

facebook-github-bot added cla signed fb-exported labels Jan 21, 2021

hlu1 force-pushed the export-D25986759 branch from 907732f to 50b8bdd Compare January 21, 2021 07:34

facebook-github-bot closed this in 6aec1eb Jan 23, 2021

facebook-github-bot added the Merged label Jan 23, 2021

Chillee mentioned this pull request Mar 30, 2022

Fix python key tracing errors with quantized models pytorch/torchdynamo#79

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aten] Optimizing reshape#50859

[aten] Optimizing reshape#50859
hlu1 wants to merge 1 commit intopytorch:masterfrom
hlu1:export-D25986759

hlu1 commented Jan 21, 2021

Uh oh!

facebook-github-bot commented Jan 21, 2021

Uh oh!

facebook-github-bot commented Jan 21, 2021

Uh oh!

codecov Bot commented Jan 21, 2021

Uh oh!

facebook-github-bot commented Jan 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hlu1 commented Jan 21, 2021

Uh oh!

facebook-github-bot commented Jan 21, 2021

Uh oh!

facebook-github-bot commented Jan 21, 2021

Uh oh!

codecov Bot commented Jan 21, 2021

Codecov Report

Uh oh!

facebook-github-bot commented Jan 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants