Skip to content

[aten] Optimizing reshape#50859

Closed
hlu1 wants to merge 1 commit intopytorch:masterfrom
hlu1:export-D25986759
Closed

[aten] Optimizing reshape#50859
hlu1 wants to merge 1 commit intopytorch:masterfrom
hlu1:export-D25986759

Conversation

@hlu1
Copy link
Copy Markdown
Contributor

@hlu1 hlu1 commented Jan 21, 2021

Summary: aten::view calls infer_size and computeStride again which are already done inside reshape. Skipping those two function calls and avoiding calling aten::view to avoid the dispatcher proves to be more efficient.

Test Plan:
Unit test:

buck test //caffe2/test:torch

Benchmark:

MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \
--scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/traced_precomputation.pt \
--pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/container_precomputation_bs20.pt \
--iters=10000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \
--pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile=true

Reduces the total time spent on reshape and flatten from 3.16% to 2.46% (net 0.7% reduction).

Before: PyTorch run finished. Milliseconds per iter: 0.0736055. Iters per second: 13585.9

    0.0013895 ms.    1.92361%. aten::reshape (2 nodes)
    0.000892179 ms.    1.23513%. aten::flatten (1 nodes)

After: PyTorch run finished. Milliseconds per iter: 0.0722102. Iters per second: 13848.5

    0.000978748 ms.    1.36668%. aten::reshape (2 nodes)
    0.000786076 ms.    1.09764%. aten::flatten (1 nodes)

Differential Revision: D25986759

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D25986759

Summary: Pull Request resolved: pytorch#50859

Test Plan:
Unit test:
```
buck test //caffe2/test:torch
```
Benchmark:
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \
--scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/traced_precomputation.pt \
--pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/container_precomputation_bs20.pt \
--iters=10000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \
--pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile=true
```

Reduces the total time spent on flatten from 1.22% to 0.97% (net 0.25% reduction).
```
Before:

Static runtime ms per iter: 0.0725054. Iters per second: 13792.1
    0.000857179 ms.    1.21862%. aten::flatten (1 nodes)

After:

Static runtime ms per iter: 0.0720371. Iters per second: 13881.7
    0.000686155 ms.    0.97151%. aten::flatten (1 nodes)
```

Differential Revision: D25986759

fbshipit-source-id: afddbe57fd00e9a6c5f589b8ee9d7f1c06156374
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D25986759

@hlu1 hlu1 force-pushed the export-D25986759 branch from 907732f to 50b8bdd Compare January 21, 2021 07:34
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 21, 2021

Codecov Report

Merging #50859 (50b8bdd) into master (439afda) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #50859      +/-   ##
==========================================
- Coverage   81.02%   81.02%   -0.01%     
==========================================
  Files        1916     1916              
  Lines      209285   209285              
==========================================
- Hits       169572   169564       -8     
- Misses      39713    39721       +8     

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in 6aec1eb.

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary: Pull Request resolved: pytorch#50859

Test Plan:
Unit test:
```
buck test //caffe2/test:torch
```
Benchmark:
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \
./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \
--scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/traced_precomputation.pt \
--pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/container_precomputation_bs20.pt \
--iters=10000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \
--pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile=true
```

Reduces the total time spent on flatten from 1.22% to 0.97% (net 0.25% reduction).
```
Before:

Static runtime ms per iter: 0.0725054. Iters per second: 13792.1
    0.000857179 ms.    1.21862%. aten::flatten (1 nodes)

After:

Static runtime ms per iter: 0.0720371. Iters per second: 13881.7
    0.000686155 ms.    0.97151%. aten::flatten (1 nodes)
```

Reviewed By: ajyu

Differential Revision: D25986759

fbshipit-source-id: dc0f542c56a688d331d349845b78084577970476
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants