[aten] Optimizing reshape#50859
Closed
hlu1 wants to merge 1 commit intopytorch:masterfrom
Closed
Conversation
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D25986759 |
Summary: Pull Request resolved: pytorch#50859 Test Plan: Unit test: ``` buck test //caffe2/test:torch ``` Benchmark: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \ ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/traced_precomputation.pt \ --pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/container_precomputation_bs20.pt \ --iters=10000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \ --pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile=true ``` Reduces the total time spent on flatten from 1.22% to 0.97% (net 0.25% reduction). ``` Before: Static runtime ms per iter: 0.0725054. Iters per second: 13792.1 0.000857179 ms. 1.21862%. aten::flatten (1 nodes) After: Static runtime ms per iter: 0.0720371. Iters per second: 13881.7 0.000686155 ms. 0.97151%. aten::flatten (1 nodes) ``` Differential Revision: D25986759 fbshipit-source-id: afddbe57fd00e9a6c5f589b8ee9d7f1c06156374
Contributor
|
This pull request was exported from Phabricator. Differential Revision: D25986759 |
907732f to
50b8bdd
Compare
Codecov Report
@@ Coverage Diff @@
## master #50859 +/- ##
==========================================
- Coverage 81.02% 81.02% -0.01%
==========================================
Files 1916 1916
Lines 209285 209285
==========================================
- Hits 169572 169564 -8
- Misses 39713 39721 +8 |
Contributor
|
This pull request has been merged in 6aec1eb. |
laurentdupin
pushed a commit
to laurentdupin/pytorch
that referenced
this pull request
Apr 24, 2026
Summary: Pull Request resolved: pytorch#50859 Test Plan: Unit test: ``` buck test //caffe2/test:torch ``` Benchmark: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 13 \ ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/traced_precomputation.pt \ --pt_inputs=/home/hlu/ads/adindexer/adindexer_ctr_mobilefeed/pt/merge_v2/container_precomputation_bs20.pt \ --iters=10000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=true \ --pt_cleanup_activations=true --pt_enable_out_variant=true --do_profile=true ``` Reduces the total time spent on flatten from 1.22% to 0.97% (net 0.25% reduction). ``` Before: Static runtime ms per iter: 0.0725054. Iters per second: 13792.1 0.000857179 ms. 1.21862%. aten::flatten (1 nodes) After: Static runtime ms per iter: 0.0720371. Iters per second: 13881.7 0.000686155 ms. 0.97151%. aten::flatten (1 nodes) ``` Reviewed By: ajyu Differential Revision: D25986759 fbshipit-source-id: dc0f542c56a688d331d349845b78084577970476
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
aten::viewcallsinfer_sizeandcomputeStrideagain which are already done insidereshape. Skipping those two function calls and avoiding callingaten::viewto avoid the dispatcher proves to be more efficient.Test Plan:
Unit test:
Benchmark:
Reduces the total time spent on reshape and flatten from 3.16% to 2.46% (net 0.7% reduction).
Differential Revision: D25986759