Skip to content

Add __torch_function__ benchmarks.#34645

Closed
hameerabbasi wants to merge 5 commits intopytorch:masterfrom
Quansight:torch-function-benchmark
Closed

Add __torch_function__ benchmarks.#34645
hameerabbasi wants to merge 5 commits intopytorch:masterfrom
Quansight:torch-function-benchmark

Conversation

@hameerabbasi
Copy link
Copy Markdown
Collaborator

No description provided.

@hameerabbasi
Copy link
Copy Markdown
Collaborator Author

Note: Do NOT merge this yet. It'll need updating after #34303 is merged.

@hameerabbasi hameerabbasi force-pushed the torch-function-benchmark branch 2 times, most recently from c639ed9 to b78eaa0 Compare March 12, 2020 11:00
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Mar 12, 2020

💊 CircleCI build failures summary and remediations

As of commit f5fcd1d (more details on the Dr. CI page):


  • 4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_build (1/4)

Step: "Build" (full log | pattern match details) <confirmed not flaky by 2 failures>

C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\include\algorithm(2749): error C2780: '_OutIt std::move(_InIt,_InIt,_OutIt)': expects 3 arguments - 1 provided
        with
        [
            T=int64_t,
            _Ty=c10::IValue,
            _RanIt=c10::impl::ListIterator<int64_t,std::_Vector_iterator<std::_Vector_val<std::_Simple_types<c10::IValue>>>>
        ]
..\caffe2\operators\experimental\c10\cpu\expand_dims_cpu.cc(17): note: while compiling class template member function 'void caffe2::`anonymous-namespace'::expand_dims_cpu<float>::operator ()(const at::Tensor &,const at::Tensor &,c10::List<int64_t>)'
C:\Users\circleci\project\aten\src\ATen/core/boxing/kernel_functor.h(276): note: see reference to function template instantiation 'void caffe2::`anonymous-namespace'::expand_dims_cpu<float>::operator ()(const at::Tensor &,const at::Tensor &,c10::List<int64_t>)' being compiled
..\caffe2\operators\experimental\c10\cpu\expand_dims_cpu.cc(60): note: see reference to class template instantiation 'caffe2::`anonymous-namespace'::expand_dims_cpu<float>' being compiled
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\include\algorithm(2749): error C2672: 'std::move': no matching overloaded function found
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\include\algorithm(2749): error C2780: '_OutIt std::move(_InIt,_InIt,_OutIt)': expects 3 arguments - 1 provided
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\include\xutility(3939): note: see declaration of 'std::move'
Microsoft (R) C/C++ Optimizing Compiler Version 19.25.28610.4 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

ITION /MD /O2 /Ob2 /DNDEBUG /w /EHa /bigobj -DNDEBUG   -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /Z7 /EHa /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\quantization\server\batch_matmul_dnnlowp_op.cc.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c ..\caffe2\quantization\server\batch_matmul_dnnlowp_op.cc 
Microsoft (R) C/C++ Optimizing Compiler Version 19.25.28610.4 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

 -DHAVE_AVX2_CPU_DEFINITION /MD /O2 /Ob2 /DNDEBUG /w /EHa /bigobj -DNDEBUG   -DCUDA_HAS_FP16=1 -DUSE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD /Z7 /EHa /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj -O2 -DCAFFE2_BUILD_MAIN_LIB -DONNX_BUILD_MAIN_LIB -std:c++14 /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\quantization\server\conv_relu_op.cc.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c ..\caffe2\quantization\server\conv_relu_op.cc 
Microsoft (R) C/C++ Optimizing Compiler Version 19.25.28610.4 for x64

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (2/4)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 19 12:41:29 RuntimeError: test_type_promotion failed!
Mar 19 12:41:29 Ran 49 tests in 7.067s 
Mar 19 12:41:29  
Mar 19 12:41:29 FAILED (errors=1) 
Mar 19 12:41:29  
Mar 19 12:41:29 Generating XML reports... 
Mar 19 12:41:29 Traceback (most recent call last): 
Mar 19 12:41:29   File "test/run_test.py", line 674, in <module> 
Mar 19 12:41:29     main() 
Mar 19 12:41:29   File "test/run_test.py", line 667, in main 
Mar 19 12:41:29     raise RuntimeError(message) 
Mar 19 12:41:29 RuntimeError: test_type_promotion failed! 
Mar 19 12:41:29 + cleanup 
Mar 19 12:41:29 + retcode=1 
Mar 19 12:41:29 + set +x 
Mar 19 12:41:29 =================== sccache compilation log =================== 
Mar 19 12:41:29 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 19 12:41:29 Compile requests                61 
Mar 19 12:41:29 Compile requests executed       22 
Mar 19 12:41:29 Cache hits                      21 
Mar 19 12:41:29 Cache misses                     0 
Mar 19 12:41:29 Cache timeouts                   0 

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (3/4)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 19 13:23:07 .[E request_callback_impl.cpp:94] Received error while processing request type 0: The following operation failed in the TorchScript interpreter.
Mar 19 13:23:06 frame #3: <unknown function> + 0x9096a3 (0x7f3d152f46a3 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 
Mar 19 13:23:06 frame #4: torch::distributed::rpc::RequestCallbackImpl::processMessage(torch::distributed::rpc::Message&) const + 0x3b2 (0x7f3d152f1452 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 
Mar 19 13:23:06 frame #5: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&) const + 0x1e (0x7f3d11fc74ee in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so) 
Mar 19 13:23:06 frame #6: torch::distributed::rpc::ProcessGroupAgent::handleRecv(torch::distributed::rpc::RecvWork&) + 0xc5 (0x7f3d152d35c5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 
Mar 19 13:23:06 frame #7: <unknown function> + 0x8e96c2 (0x7f3d152d46c2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 
Mar 19 13:23:06 frame #8: c10::ThreadPool::main_loop(unsigned long) + 0x2b3 (0x7f3d06640023 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) 
Mar 19 13:23:06 frame #9: <unknown function> + 0xc819d (0x7f3d18ce219d in /opt/conda/lib/libstdc++.so.6) 
Mar 19 13:23:06 frame #10: <unknown function> + 0x76ba (0x7f3d4df966ba in /lib/x86_64-linux-gnu/libpthread.so.0) 
Mar 19 13:23:06 frame #11: clone + 0x6d (0x7f3d4dccc41d in /lib/x86_64-linux-gnu/libc.so.6) 
Mar 19 13:23:06  
Mar 19 13:23:07 .[E request_callback_impl.cpp:94] Received error while processing request type 0: The following operation failed in the TorchScript interpreter. 
Mar 19 13:23:07 Traceback of TorchScript (most recent call last): 
Mar 19 13:23:07   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/distributed/rpc/jit/rpc_test.py", line 209, in raise_script 
Mar 19 13:23:07 @torch.jit.script 
Mar 19 13:23:07 def raise_script(): 
Mar 19 13:23:07     raise RuntimeError("Expected error") 
Mar 19 13:23:07     ~~~~~~~~~~~~~~~~~~~~ <--- HERE 
Mar 19 13:23:07     return 0 
Mar 19 13:23:07 RuntimeError: Exception 
Mar 19 13:23:07  
Mar 19 13:23:23 ..................s... 

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test (4/4)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 19 12:02:09 caused by: Connection refused (os error 111)
Mar 19 12:02:09 +++ eval 'extract_trap_cmd ' 
Mar 19 12:02:09 ++++ extract_trap_cmd 
Mar 19 12:02:09 ++++ printf '%s\n' '' 
Mar 19 12:02:09 +++ printf '%s\n' cleanup 
Mar 19 12:02:09 ++ trap -- ' 
Mar 19 12:02:09 cleanup' EXIT 
Mar 19 12:02:09 ++ which sccache 
Mar 19 12:02:09 ++ sccache --stop-server 
Mar 19 12:02:09 Stopping sccache server... 
Mar 19 12:02:09 error: couldn't connect to server 
Mar 19 12:02:09 caused by: Connection refused (os error 111) 
Mar 19 12:02:09 ++ true 
Mar 19 12:02:09 ++ rm /var/lib/jenkins/sccache_error.log 
Mar 19 12:02:09 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Mar 19 12:02:09 ++ SCCACHE_IDLE_TIMEOUT=1200 
Mar 19 12:02:09 ++ RUST_LOG=sccache::server=error 
Mar 19 12:02:09 ++ sccache --start-server 
Mar 19 12:02:09 Starting sccache server... 
Mar 19 12:02:09 ++ sccache --zero-stats 
Mar 19 12:02:09 Compile requests                 0 
Mar 19 12:02:09 Compile requests executed        0 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 64 times.

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 12, 2020
Copy link
Copy Markdown
Collaborator

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a bit too concise - the goal is really to make sure we get complete and consistent benchmarking results for future PRs that touch the dispatching code.

Comment thread benchmarks/overrides_benchmark/README.md Outdated
Comment thread benchmarks/overrides_benchmark/README.md Outdated
Comment thread benchmarks/overrides_benchmark/README.md Outdated
Comment thread benchmarks/overrides_benchmark/README.md Outdated
@hameerabbasi hameerabbasi force-pushed the torch-function-benchmark branch from b78eaa0 to 042a084 Compare March 13, 2020 10:54
@hameerabbasi
Copy link
Copy Markdown
Collaborator Author

The clang-tidy failure is related to the network.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 13, 2020

@ngoldbaum I'll let you review first!

Comment thread benchmarks/overrides_benchmark/README.md Outdated

* Run `bench.py`, and include the output in your result.
* For each case where `bench.py` shows a regression, run the commands described above, prefixing the output SVG filename (the input to the `-o` switch) with `base-` or `branch-` depending on the commit you are running the benchmark on.
* For each SVG, open it in the browser, take a screenshot and include it in your result. Also include a ZIP file with all SVGs thus produced included.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this part be automated? Maybe rather than relying on the py-spy flame graph svg output it would make more sense to process the data from the "raw" output format to extract relevant information that can be copy/pasted into a PR discussion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there docs on that format somewhere?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s mentioned in the output of py-spy record —help

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the actual format that raw produces, is there a parser or some docs on it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think it's documented, py-spy's docs are pretty barebones. Looking at the py-spy source code it looks like it's writing the contents of the HashMap mapping stack frame names to counts to the output file, see:

https://github.com/benfred/py-spy/blob/d870b3768353001b31ee78129a79be99319b9bec/src/main.rs#L135-L145

https://github.com/benfred/py-spy/blob/d870b3768353001b31ee78129a79be99319b9bec/src/flamegraph.rs#L39-L42

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The py-spy raw output is in the 'folded stack' format used by flamegraph.pl (https://github.com/brendangregg/FlameGraph#2-fold-stacks) . The format is a text file with a stack trace /sample count per line, with each frame in the stack trace being delimited with a ';'. Fwiw for benchmarking, you can take a 'before' and 'after' version of these raw files and compute a differential flamegraph.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this. It would introduce another dependency to rasterize the SVG, and another to diff the flame graph. I wonder if it's worth it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to mention auto checkout and auto builds -- Which is bad.

Comment thread torch/csrc/utils/python_arg_parser.cpp Outdated
Comment thread benchmarks/overrides_benchmark/common.py
@hameerabbasi hameerabbasi force-pushed the torch-function-benchmark branch from e0f1d95 to f3629ee Compare March 16, 2020 11:00
@hameerabbasi hameerabbasi force-pushed the torch-function-benchmark branch from f3629ee to 17dc58b Compare March 16, 2020 11:01
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 16, 2020

I'm happy to merge this when y'all are OK with it. Note that you should consider running the bench as part of our CI to make sure it doesn't bitrot. (Don't have to look at the numbers though)

@hameerabbasi
Copy link
Copy Markdown
Collaborator Author

hameerabbasi commented Mar 18, 2020

Note that you should consider running the bench as part of our CI to make sure it doesn't bitrot. (Don't have to look at the numbers though)

Where would I make these changes? And is it okay if it takes O(10 s) to run all of them, or should I reduce the number of reps via command line arguments?

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 18, 2020

Reduce the number of reps.

To add new scripts to CI look at .jenkins/pytorch/test.sh

@hameerabbasi hameerabbasi force-pushed the torch-function-benchmark branch from d0de1c9 to ffe35e5 Compare March 19, 2020 11:23
@hameerabbasi hameerabbasi force-pushed the torch-function-benchmark branch from ffe35e5 to f5fcd1d Compare March 19, 2020 11:26
@hameerabbasi
Copy link
Copy Markdown
Collaborator Author

CI failures seem completely unrelated.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hameerabbasi
Copy link
Copy Markdown
Collaborator Author

Looks like internal tests are failing -- Anything I can do to help fix this?

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 26, 2020

they're fake, don't worry

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Mar 26, 2020

Non-ASCII characters broke tests:

Mar 26 20:13:22 + python bench.py -n 1 -m 1
Mar 26 20:13:22 ~/workspace/benchmarks/overrides_benchmark ~/workspace
Mar 26 20:13:22 Traceback (most recent call last):
Mar 26 20:13:22   File "bench.py", line 67, in <module>
Mar 26 20:13:22     main()
Mar 26 20:13:22   File "bench.py", line 61, in main
Mar 26 20:13:22     t.__name__, (10 ** 6) * bench_min, (10 ** 6) * bench_std,
Mar 26 20:13:22 UnicodeEncodeError: 'ascii' codec can't encode character '\u03bc' in position 54: ordinal not in range(128)

https://app.circleci.com/pipelines/github/pytorch/pytorch/146972/workflows/7596773d-d427-41ed-a007-48a282e155d1/jobs/4963663

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ezyang merged this pull request in bf24753.

facebook-github-bot pushed a commit that referenced this pull request Apr 10, 2020
Summary:
Re-land of #35530 and #34645
Pull Request resolved: #36138

Differential Revision: D20893770

Pulled By: ezyang

fbshipit-source-id: 75ab688a086f5fb87412a853df5246c0c39704ca
ashishfarmer pushed a commit to ashishfarmer/pytorch that referenced this pull request Apr 13, 2020
Summary:
Re-land of pytorch#35530 and pytorch#34645
Pull Request resolved: pytorch#36138

Differential Revision: D20893770

Pulled By: ezyang

fbshipit-source-id: 75ab688a086f5fb87412a853df5246c0c39704ca
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary: Pull Request resolved: pytorch#34645

Differential Revision: D20653072

Pulled By: ezyang

fbshipit-source-id: e7e363f8a1b84fc0c354586e266a695e4a2ea60e
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Re-land of pytorch#35530 and pytorch#34645
Pull Request resolved: pytorch#36138

Differential Revision: D20893770

Pulled By: ezyang

fbshipit-source-id: 75ab688a086f5fb87412a853df5246c0c39704ca
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants