Throwing the simple_ir into the fuser branch. by jjsjann123 · Pull Request #1 · csarofeen/pytorch

jjsjann123 · 2020-02-05T02:08:38Z

No description provided.

Summary: Pull Request resolved: pytorch#37101 Fixes pytorch#36954. The basic concept is to streamline the process of rethrowing c10::Error with extra error information. This is in a few steps: - I completely remodeled the Error data type and the internal invariants. Instead of manually adding in newlines, the message stack formatting process is responsible for inserting newlines and spacing as necessary. Call sites are then modified to respect the new API model. - TORCH_RETHROW macro is added, which adds context to an error message and then rethrows it. New internal assert failure looks like: ``` 0 INTERNAL ASSERT FAILED at ../c10/test/util/exception_test.cpp:64, please report a bug to PyTorch. Exception raised from TestBody at ../c10/test/util/exception_test.cpp:64 (most recent call first): frame #0: <unknown function> + 0x6aab9 (0x7ff611d3aab9 in /data/users/ezyang/pytorch-tmp/build/lib/libc10.so) frame #1: ... ``` Error message with context looks like: ``` This is an error This is context 1 This is context 2 ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21202891 Pulled By: ezyang fbshipit-source-id: 361cadd16bc52e5886dba08e79277771ada76169

Handle expressions with missing inputs as needed

Summary: Pull Request resolved: pytorch#46966 These tests had false positives in TSAN for modifying thread local variables: ``` WARNING: ThreadSanitizer: data race (pid=5364) Write of size 8 at 0x7b2c0004ff70 by thread T2: #0 free <null> (libtools_build_sanitizers_tsan-py.so+0xde6ad) #1 __GI__dl_deallocate_tls Previous write of size 1 at 0x7b2c0004ff71 by thread T3: #0 at::GradMode::set_enabled(bool) caffe2/aten/src/ATen/core/grad_mode.cpp:20 (libcaffe2_ATen-core.so+0x40e013) #1 torch::autograd::set_grad_enabled(_object*, _object*) caffe2/torch/csrc/autograd/init.cpp:143 (libcaffe2__C_impl_cuda.so+0x115ef0e) #2 _PyMethodDef_RawFastCallKeywords Thread T3 (tid=5385, finished) created by main thread at: #0 pthread_create <null> (libtools_build_sanitizers_tsan-py.so+0xc5a86) #1 PyThread_start_new_thread ``` ghstack-source-id: 115330433 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D24584411 fbshipit-source-id: e35f704dfcb7b161a13a4902beaf8b1e41ccd596

Summary: added more statistic info for static runtime Test Plan: caffe2/benchmarks/static_runtime:static_runtime_cpptest Expected output example: Static runtime ms per iter: 0.939483. Iters per second: 1064.41 Node #0: 0.195671 ms/iter, %wide_offset.1 : Tensor = aten::add(%wide.1, %self._mu, %4) Node #1: 0.169457 ms/iter, %wide_normalized.1 : Tensor = aten::mul(%wide_offset.1, %self._sigma) Node #2: 0.118218 ms/iter, %wide_preproc.1 : Tensor = aten::clamp(%wide_normalized.1, %5, %6) Node #3: 0.038814 ms/iter, %user_emb_t.1 : Tensor = aten::transpose(%user_emb.1, %4, %7) Node #4: 0.0860747 ms/iter, %dp_unflatten.1 : Tensor = aten::bmm(%ad_emb_packed.1, %user_emb_t.1) Node #5: 0.0102666 ms/iter, %31 : Tensor = static_runtime::flatten_copy(%dp_unflatten.1, %4, %8) Node #6: 0.000476333 ms/iter, %19 : Tensor[] = prim::ListConstruct(%31, %wide_preproc.1) Node #7: 0.0707332 ms/iter, %input.1 : Tensor = aten::cat(%19, %4) Node #8: 0.123695 ms/iter, %fc1.1 : Tensor = aten::addmm(%self._fc_b, %input.1, %29, %4, %4) Node #9: 0.0309244 ms/iter, %23 : Tensor = aten::sigmoid(%fc1.1) Node #10: 0.0046297 ms/iter, %24 : (Tensor) = prim::TupleConstruct(%23) Time per node type: 0.195671 ms. 23.0483%. aten::add (1 nodes) 0.169457 ms. 19.9605%. aten::mul (1 nodes, out variant) 0.123695 ms. 14.5702%. aten::addmm (1 nodes, out variant) 0.118218 ms. 13.925%. aten::clamp (1 nodes, out variant) 0.0860747 ms. 10.1388%. aten::bmm (1 nodes, out variant) 0.0707332 ms. 8.33175%. aten::cat (1 nodes, out variant) 0.038814 ms. 4.57195%. aten::transpose (1 nodes) 0.0309244 ms. 3.64263%. aten::sigmoid (1 nodes, out variant) 0.0102666 ms. 1.20932%. static_runtime::flatten_copy (1 nodes, out variant) 0.0046297 ms. 0.545338%. prim::TupleConstruct (1 nodes, out variant) 0.000476333 ms. 0.0561079%. prim::ListConstruct (1 nodes, out variant) 0.848959 ms. in Total StaticRuntime setup time: 0.018925 ms Memory allocation time: 0.019808 ms Memory deallocation time: 0.0120445 ms Outputs deallocation time: 0.0864947 ms Total memory managed: 19328 bytes Total number of reused tensors: 3 Total number of 'out' variant nodes/total number of nodes: 9/11 (81.8182%) Reviewed By: hlu1 Differential Revision: D28553029 fbshipit-source-id: 55e7eab50b4b475ae219896100bdf4f6678875a4

Summary: Pull Request resolved: pytorch#60987 We were seeing deadlocks as follows during shutdown: ``` Thread 1 (LWP 2432101): #0 0x00007efca470190b in __pause_nocancel () from /lib64/libc.so.6 #1 0x00007efca49de485 in __pthread_mutex_lock_full () from /lib64/libpthread.so.0 #2 0x00007ef91d4c42c6 in __cuda_CallJitEntryPoint () from /lib64/libnvidia-ptxjitcompiler.so.1 #3 0x00007efc651ac8f1 in ?? () from /lib64/libcuda.so #4 0x00007efc651aee03 in ?? () from /lib64/libcuda.so #5 0x00007efc64f76b84 in ?? () from /lib64/libcuda.so #6 0x00007efc64f77f5d in ?? () from /lib64/libcuda.so #7 0x00007efc64eac858 in ?? () from /lib64/libcuda.so #8 0x00007efc64eacfbc in ?? () from /lib64/libcuda.so #9 0x00007efc7810a924 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #10 0x00007efc780fa2be in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #11 0x00007efc78111044 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #12 0x00007efc7811580a in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #13 0x00007efc78115aa4 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #14 0x00007efc781079ec in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #15 0x00007efc780e6a7a in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #16 0x00007efc7811cfa5 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #17 0x00007efc777ea98c in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #18 0x00007efc777ebd80 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #19 0x00007efc777ea2c9 in ?? () from /usr/local/cuda/lib64/libcublas.so.11 #20 0x00007efc778c2e2d in cublasDestroy_v2 () from /usr/local/cuda/lib64/libcublas.so.11 #21 0x00007efc51a3fb56 in std::_Sp_counted_ptr_inplace<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle>, std::allocator<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so #22 0x00007efc51a3fc5f in std::shared_ptr<at::cuda::(anonymous namespace)::DeviceThreadHandlePool<cublasContext*, &at::cuda::(anonymous namespace)::createCublasHandle, &at::cuda::(anonymous namespace)::destroyCublasHandle> >::~shared_ptr() () from /data/users/pritam/pytorch/torch/lib/libtorch_cuda.so #23 0x00007efca4648b0c in __run_exit_handlers () from /lib64/libc.so.6 #24 0x00007efca4648c40 in exit () from /lib64/libc.so.6 #25 0x0000558c8852e5f9 in Py_Exit (sts=0) at /tmp/build/80754af9/python_1614362349910/work/Python/pylifecycle.c:2292 #26 0x0000558c8852e6a7 in handle_system_exit () at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:636 #27 0x0000558c8852e742 in PyErr_PrintEx (set_sys_last_vars=<optimized out>, set_sys_last_vars=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:646 #28 0x0000558c88540dd6 in PyRun_SimpleStringFlags (command=0x7efca4dc9050 "from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=9, pipe_handle=13)\n", flags=0x7ffe3a986110) at /tmp/build/80754af9/python_1614362349910/work/Python/pythonrun.c:457 #29 0x0000558c88540ead in pymain_run_command (cf=0x7ffe3a986110, command=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:420 #30 pymain_run_python (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:2907 #31 pymain_main (pymain=0x7ffe3a986220) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3460 #32 0x0000558c8854122c in _Py_UnixMain (argc=<optimized out>, argv=<optimized out>) at /tmp/build/80754af9/python_1614362349910/work/Modules/main.c:3495 #33 0x00007efca4632493 in __libc_start_main () from /lib64/libc.so.6 #34 0x0000558c884e5e90 in _start () at ../sysdeps/x86_64/elf/start.S:103 ``` This was likely caused due to a static singleton that wasn't leaky. Following the guidance in https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 to use a leaky singleton instead. ghstack-source-id: 132847448 Test Plan: Verified locally. Reviewed By: malfet Differential Revision: D29468866 fbshipit-source-id: 89250594c5cd2643417b1da584c658b742dc5a5c

Summary: Pull Request resolved: pytorch#61588 As part of debugging pytorch#60290, we discovered the following deadlock: ``` Thread 79 (Thread 0x7f52ff7fe700 (LWP 205437)): #0 pthread_cond_timedwait@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 #1 0x0000564880199152 in PyCOND_TIMEDWAIT (cond=0x564880346080 <gil_cond>, mut=0x564880346100 <gil_mutex>, us=5000) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/condvar.h:103 #2 take_gil (tstate=0x7f5254005ef0) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval_gil.h:224 #3 0x0000564880217b62 in PyEval_AcquireThread (tstate=0x7f5254005ef0) at /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:278 #4 0x00007f557d54aabd in pybind11::gil_scoped_acquire::gil_scoped_acquire() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so #5 0x00007f557da7792f in (anonymous namespace)::concrete_decref_fn(c10::impl::PyInterpreter const*, _object*) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so #6 0x00007f5560dadba6 in c10::TensorImpl::release_resources() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so #7 0x00007f5574c885bc in std::_Sp_counted_ptr_inplace<torch::distributed::autograd::DistAutogradContext, std::allocator<torch::distributed::autograd::DistAutogradContext>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #8 0x00007f5574c815e9 in std::__detail::_Hashtable_alloc<std::allocator<std::__detail::_Hash_node<std::pair<long const, std::shared_ptr<torch::distributed::autograd::DistAutogradContext> >, false> > >::_M_deallocate_node(std::__detail::_Hash_node<std::pair<long const, std::shared_ptr<torch::distributed::autograd::DistAutogradContext> >, false>*) [clone .isra.325] () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #9 0x00007f5574c81bf1 in torch::distributed::autograd::DistAutogradContainer::eraseContextIdAndReset(torch::distributed::autograd::DistAutogradContainer::ContextsShard&, long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #10 0x00007f5574c86e83 in torch::distributed::autograd::DistAutogradContainer::releaseContextIfPresent(long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #11 0x00007f5574cc6395 in torch::distributed::rpc::RequestCallbackNoPython::processCleanupAutogradContextReq(torch::distributed::rpc::RpcCommandBase&) const () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #12 0x00007f5574cccf15 in torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so Thread 72 (Thread 0x7f53077fe700 (LWP 205412)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007f55bc62adbd in __GI___pthread_mutex_lock (mutex=0x564884396440) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f5574c82a2f in torch::distributed::autograd::DistAutogradContainer::retrieveContext(long) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so #3 0x00007f557de9bb2f in pybind11::cpp_function::initialize<torch::distributed::autograd::(anonymous namespace)::dist_autograd_init(_object*, _object*)::{lambda(long)#11}, pybind11::dict, long, pybind11::name, pybind11::scope, pybind11::sibling, char [931], pybind11::arg>(torch::distributed::autograd::(anonymous namespace)::dist_autograd_init(_object*, _object*)::{lambda(long)#11}&&, pybind11::dict (*)(long), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [931], pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) () from /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so ``` Basically Thread 72, holds GIL and tries to acquire the lock for DistAutogradContainer to perform a lookup on a map. On the other hand, Thread 79 holds the lock on DistAutogradContainer to remove a Tensor and as part of TensorImpl destructor, concrete_decref_fn is called which waits for GIL. As a result, we have a deadlock. To fix this issue, I've ensured we release GIL when we call `retrieveContext` and acquire it later when needed. ghstack-source-id: 133493659 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D29682624 fbshipit-source-id: f68a1fb39040ca0447a26e456a97bce64af6b79c

…ytorch#63339) Summary: Pull Request resolved: pytorch#63339 # Context https://fb.workplace.com/groups/pytorch.dev/permalink/900474523864362/?comment_id=901125403799274&reply_comment_id=905023386742809 ##### WHAT IS A STACK TRACE? A stack trace (also called stack backtrace or stack traceback) is a report of the active stack frames at a certain point in time during the execution of a program. Typically when an exception is thrown, one would expect to see the code (file:line) that threw the exception, and every intermediate frame up to and including the main function. We are enabling android stack trace to help debugging on android devices. Test Plan: ## Steps to test ``` buck build fbsource//xplat/caffe2/mode/aibench_pytorch_android -c pt.enable_qpl=0 -c pt.has_backtraces=1 fbsource//xplat/caffe2/fb/lite_predictor:lite_predictorAndroid#android-x86_64 one_world android emulator android-28 adb push ~/fbsource/buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictorAndroid#android-x86_64 /data/local/tmp cd /data/local/tmp ./lite_predictorAndroid#android-x86_64 ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true ``` ## See how model file is not found stack traces is: ### before ``` ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true Run with 2 threads Run with 2 threads Loading model... terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first): (no backtrace available) Aborted ``` ### after ``` 134|generic_x86_64:/data/local/tmp $ ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true Run with 2 threads Run with 2 threads Loading model... terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first): frame #0 c10::get_backtrace(unsigned long, unsigned long, bool)[0x59494274f10e] frame #1 [0x5949427b1eee] frame #2 [0x5949427b1eb2] frame #3 [0x5949427b1cdc] frame #4 std::__ndk1::function<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > ()>::operator()() const[0x5949427afc34] frame #5 c10::Error::Error(c10::SourceLocation, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >)[0x5949427b05b1] frame #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949427aca5f] frame #7 caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b37b2] frame #8 caffe2::serialize::FileAdapter::FileAdapter(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b3903] frame #9 torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>, std::__ndk1::unordered_map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::hash<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::equal_to<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > > >&)[0x5949422737bd] frame #10 torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>)[0x594942273769] frame #11 benchmark(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x59494189b21d] frame #12 main[0x594941882aff] frame #13 __libc_init[0x7b699d08578d] ``` ### what we get for os:linux ``` (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true Run with 24 threads Run with 24 threads Loading model... terminate called after throwing an instance of 'c10::Error' what(): open file failed, file path: ./detect.bc Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first): frame #0: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb7fe] frame #1: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb6c6] frame #2: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x54 (0x20ca4e4 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #3: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x57 (0x20ca9a7 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #4: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7a (0x20c823a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #5: caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x96 (0x206f3d6 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #6: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x42 (0x206f502 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #7: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x30 (0x1be826c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #8: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) + 0x35 (0x1be8214 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #9: benchmark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x16d (0x12093ad in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #10: main + 0x25c (0x11f933c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) frame #11: __libc_start_main + 0x105 (0x7fc7b9f2ed95 in /usr/local/fbcode/platform009/lib/libc.so.6) frame #12: _start + 0x2a (0x11f902a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor) Aborted (core dumped) ```` Reviewed By: dhruvbird Differential Revision: D30135947 fbshipit-source-id: f50c634ef4545843305cad4b4a14a8776b1aec76

…4332) Summary: Pull Request resolved: pytorch#64332 With this diff, if a compiler bug occurs (unlikely, I know!) we'll be able to get a c++ stacktrace leading to the exception, rather than just a terse message. E.g., ``` RuntimeError: UNSUPPORTED DTYPE Exception raised from compilation_error at ../torch/csrc/jit/tensorexpr/exceptions.h:32 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f966659b2eb in /fsx/users/bertrand/c\ onda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x376f099 (0x7f966a195099 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so) frame #2: <unknown function> + 0x3763bf5 (0x7f966a189bf5 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so) frame #3: torch::jit::tensorexpr::CudaCodeGen::Initialize() + 0xdd8 (0x7f966a193368 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda\ .so) ``` Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D30745610 Pulled By: bertmaher fbshipit-source-id: a1cfaa7364ef4120de834e9cbe57ced1d082ab4e

Summary: Pull Request resolved: pytorch#66009 Fixes ``` test_trace_c10_ops (jit.test_tracer.TestTracer) ... third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374:24: runtime error: applying non-zero offset 4 to null pointer #0 0x7f5228f72227 in Eigen::internal::BlockImpl_dense<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, true>::BlockImpl_dense(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374 #1 0x7f5228f7212c in Eigen::BlockImpl<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, Eigen::Dense>::BlockImpl(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:166 #2 0x7f5228f720dc in Eigen::Block<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false>::Block(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:142 #3 0x7f5229b0e059 in Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::FixedBlockXpr<internal::get_fixed_value<int>::value, internal::get_fixed_value<long>::value>::Type Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::block<int, long>(long, long, int, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/../plugins/BlockMethods.h:98 #4 0x7f5229b0c5ca in caffe2::GenerateProposalsOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/generate_proposals_op.cc:348 ``` Also cleans up some data type and const issues around the area. Test Plan: Sandcastle Reviewed By: xush6528 Differential Revision: D31343046 fbshipit-source-id: fd9096c8e47a0aad529c72fd313f64ca98dcb80b

Summary: Pull Request resolved: pytorch#66060 Fixes ``` testTumHistoryAdditionalLaser (caffe2.caffe2.fb.layers.tests.tum_history_test.TestTumHistory) ... caffe2/caffe2/operators/concat_split_op.h:363:74: runtime error: applying non-zero offset 8 to null pointer #0 0x7f8f39d29795 in caffe2::ConcatOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/concat_split_op.h:363 #1 0x7f8f39c4978d in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:987 #2 0x7f8f381fe9c9 in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:67 #3 0x7f8f38ee488e in caffe2::Workspace::RunNet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) caffe2/caffe2/core/workspace.cc:289 ``` Test Plan: Sandcastle Reviewed By: dzhulgakov, xush6528 Differential Revision: D31366205 fbshipit-source-id: 566aa519677c9d371189e4b1f81d595732861efc

…78136) This prevents `import torch` accidentally crash on machines with no metal devices Should prevent crashes reported in pytorch#77662 (comment) and https://github.com/pytorch/functorch/runs/6560056366?check_suite_focus=true Backtrace to the crash: ``` (lldb) bt * thread #1, stop reason = signal SIGSTOP * frame #0: 0x00007fff7202be57 libobjc.A.dylib`objc_msgSend + 23 frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 frame #2: 0x000000010fda011d libtorch_cpu.dylib`_GLOBAL__sub_I_MPSAllocator.mm + 125 frame #3: 0x000000010ada81e3 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535 frame #4: 0x000000010ada85ee dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40(lldb) up frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl: -> 0x10fd9f524 <+436>: movq %rax, 0x1b0(%rbx) 0x10fd9f52b <+443>: movw $0x0, 0x1b8(%rbx) 0x10fd9f534 <+452>: addq $0x8, %rsp 0x10fd9f538 <+456>: popq %rbx (lldb) disassemble ... 0x10fd9f514 <+420>: movq 0xf19ad15(%rip), %rsi ; "maxBufferLength" 0x10fd9f51b <+427>: movq %r14, %rdi 0x10fd9f51e <+430>: callq *0xeaa326c(%rip) ; (void *)0x00007fff7202be40: objc_msgSend ``` which corresponds to `[m_device maxBufferLength]` call, where `m_device` is not initialized in https://github.com/pytorch/pytorch/blob/2ae3c59e4bcb8e6e75b4a942cacc2d338c88e609/aten/src/ATen/mps/MPSAllocator.h#L171 Pull Request resolved: pytorch#78136 Approved by: https://github.com/seemethere

… of libtorch_python (pytorch#78028) Summary: This moves torch::class_<WorkerInfo> into `rpc_agent.cpp` so it gets registered in libtorch instead of libtorch_python. This is intermediate work to getting torch::deploy to load an unmodified copy of libtorch. Current RPC is incompatible due to duplicate registrations. ``` unknown file: Failure C++ exception with description "Exception Caught inside torch::deploy embedded library: Custom class with name __torch__.torch.classes.dist_rpc.WorkerInfo is already registered. Ensure that registration with torch::class_ is only called once. Exception raised from registerCustomClass at ../aten/src/ATen/core/custom_class.cpp:61 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f3bd9adb92e in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7f3bd9ab7068 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so) frame #2: torch::registerCustomClass(std::shared_ptr<c10::ClassType>) + 0x110 (0x7f3bc2258980 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame #3: torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&) + 0x3b9 (0x7f3bc225a419 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so) frame #4: [0x7f3ba45cfea1] frame #5: <unknown function> + 0x1b5334 (0x5652bdab9334 in ./test_deploy) frame #6: <unknown function> + 0x1b4f3e (0x5652bdab8f3e in ./test_deploy) frame #7: <unknown function> + 0x1b519b (0x5652bdab919b in ./test_deploy) frame #8: loadSearchFile(char const*) + 0x23e (0x7f3ba62f37f8 in /tmp/torch_deploy9ATEFg) frame #9: deploy_set_self + 0x51 (0x7f3ba62f38f9 in /tmp/torch_deploy9ATEFg) frame #10: torch::deploy::Interpreter::Interpreter(torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>) + 0x274 (0x5652bdaaa790 in ./test_deploy) frame #11: void __gnu_cxx::new_allocator<torch::deploy::Interpreter>::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x81 (0x5652bdaaf58b in ./test_deploy) frame #12: void std::allocator_traits<std::allocator<torch::deploy::Interpreter> >::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(std::allocator<torch::deploy::Interpreter>&, torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x4a (0x5652bdaae320 in ./test_deploy) frame #13: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::_M_realloc_insert<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(__gnu_cxx::__normal_iterator<torch::deploy::Interpreter*, std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> > >, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xee (0x5652bdaae4a0 in ./test_deploy) frame #14: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::emplace_back<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xb6 (0x5652bdaad258 in ./test_deploy) frame #15: torch::deploy::InterpreterManager::InterpreterManager(unsigned long, std::shared_ptr<torch::deploy::Environment>) + 0x123 (0x5652bdaa83b1 in ./test_deploy) frame #16: TorchpyTest_InitTwice_Test::TestBody() + 0x65 (0x5652bda075a9 in ./test_deploy) frame #17: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x65 (0x5652bda944b7 in ./test_deploy) frame #18: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x5a (0x5652bda8cfe7 in ./test_deploy) frame #19: testing::Test::Run() + 0x100 (0x5652bda68622 in ./test_deploy) frame #20: testing::TestInfo::Run() + 0x10f (0x5652bda68fb3 in ./test_deploy) frame #21: testing::TestSuite::Run() + 0x121 (0x5652bda6980d in ./test_deploy) frame #22: testing::internal::UnitTestImpl::RunAllTests() + 0x38e (0x5652bda756e6 in ./test_deploy) frame #23: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x65 (0x5652bda9586b in ./test_deploy) frame #24: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x5a (0x5652bda8e0f7 in ./test_deploy) frame #25: testing::UnitTest::Run() + 0xc9 (0x5652bda73fd1 in ./test_deploy) frame #26: RUN_ALL_TESTS() + 0x11 (0x5652bda169fa in ./test_deploy) frame #27: main + 0x27 (0x5652bda10ce2 in ./test_deploy) frame #28: <unknown function> + 0x2d310 (0x7f3bc0431310 in /usr/lib/libc.so.6) frame #29: __libc_start_main + 0x81 (0x7f3bc04313c1 in /usr/lib/libc.so.6) frame #30: _start + 0x25 (0x5652bda063b5 in ./test_deploy) ``` Test Plan: CI Differential Revision: D36564258 Pull Request resolved: pytorch#78028 Approved by: https://github.com/rohan-varma

…ytorch#78276) Fixes #325 **Summary**: Currently, the pytorchbot only allows for rebasing to the master branch. These modifications add functionality for rebasing to the 'viable/strict' branch of pytorch/pytorch by adding a flag to the comment. **Test Plan:** tested manually on personal fork ([#1](swang392#1)), and included a test case in test_tryrebase.py that checks if rebasing to viable/strict branch was successful. Pull Request resolved: pytorch#78276 Approved by: https://github.com/clee2000, https://github.com/janeyx99

… to conform with non-quantized countertpart filenames Summary: Names of analogous files in quantized directory (previously snake case) were inconsistent with their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes all files in quantized (and sub-directories) dir to have pascal case. `aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR. ``` terminate called after throwing an instance of 'c10::Error' what(): Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types. Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so) frame #2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so) frame #8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2) ..........................truncated............. ``` Test Plan: ``` python test/test_quantization.py ``` Pull Request resolved: pytorch#77037 Approved by: https://github.com/jerryzh168

### Summary: This PR implements PTQ for APoT FakeQuant. It runs models (Resnet-18 pre-trained model, ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. According to the collected accuracy stats, model #2 (uniform activation and APoT weight) appears to have a slight improvement in accuracy compared to model #1 (uniform activation and uniform weight) for 8-bit and significant improvement for 4-bit (see "Accuracy Stats" section below). ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: 8-bit (Uniform int8, APoT b = 8 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.43% (Top-1), 85.62% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.51% (Top-1), 85.78% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.32% (Top-1), 85.78% (Top-5) 4-bit (Uniform int4, APoT b = 4 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.63% (Top-1), 71.96% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.24% (Top-1), 85.56% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.40% (Top-1), 76.21% (Top-5) **Full Precision model (FX Graph Mode quantized)** Evaluation accuracy on test dataset: 69.76% (Top-1), 89.08% (Top-5) **Eager mode quantized model** Evaluation accuracy on test dataset: 69.49% (Top-1), 88.90% (Top-5) Pull Request resolved: pytorch#81040 Approved by: https://github.com/jerryzh168

Hi! I was playing with libfuzzer and found bug when loading a model from file via `torch::jit::load` function. There is an unhandled exception in caffe2/serialize when calling a `stoull` function on unsanitized version string. The bug can be reproduced with `aot_model_compiler` binary: ``` aot_model_compiler --model=crash-stoull --model_name=name --model_version=1 --input_dims='1,3,224,224;2,2' --input_types='float;float' ``` Crash file is provided in [crash.zip](https://github.com/pytorch/pytorch/files/8701504/crash.zip). gdb output: ``` Temporary breakpoint 1, main (argc=6, argv=0x7ffcd160f9f8) at /pytorch_master/binaries/aot_model_compiler.cc:87 87 "Run NNC AOT compiler for pytorch model. Example usage:\n" (gdb) c Continuing. terminate called after throwing an instance of 'std::invalid_argument' what(): stoull Program received signal SIGABRT, Aborted. __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fa637f16859 in __GI_abort () at abort.c:79 #2 0x00007fa6381c1911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x00007fa6381cd38c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 #4 0x00007fa6381cd3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6 #5 0x00007fa6381cd6a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6 #6 0x00007fa6381c42ce in std::__throw_invalid_argument(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6 #7 0x000000000247d567 in __gnu_cxx::__stoa<unsigned long long, unsigned long long, char, int> (__str=0x7ffcd160f228 "ZZ", __idx=0x0, __base=10, __convf=<optimized out>, __name=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/string_conversions.h:83 #8 std::__cxx11::stoull (__str="ZZ", __idx=0x0, __base=10) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/basic_string.h:6577 #9 caffe2::serialize::PyTorchStreamReader::init (this=this@entry=0x8c11ce0) at /pytorch_master/caffe2/serialize/inline_container.cc:145 #10 0x000000000247d9c7 in caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader (this=0x8c11ce0, in=std::shared_ptr<class caffe2::serialize::ReadAdapterInterface> (empty) = {...}) at /pytorch_master/caffe2/serialize/inline_container.cc:88 #11 0x00000000035b7ba4 in __gnu_cxx::new_allocator<caffe2::serialize::PyTorchStreamReader>::construct<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > ( __p=0x2, __args=..., this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/new_allocator.h:150 #12 std::allocator_traits<std::allocator<caffe2::serialize::PyTorchStreamReader> >::construct<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__a=..., __p=0x2, __p@entry=0x8c11ce0, __args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/alloc_traits.h:512 #13 0x00000000035b1988 in std::_Sp_counted_ptr_inplace<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x8c11cd0, __a=..., __args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:551 #14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a8, __p=@0x7ffcd160f3a0: 0x10, __args=..., __a=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:683 #15 std::__shared_ptr<caffe2::serialize::PyTorchStreamReader, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a0, __args=..., __tag=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:1371 #16 std::shared_ptr<caffe2::serialize::PyTorchStreamReader>::shared_ptr<std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a0, __args=..., __tag=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:408 #17 std::allocate_shared<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__args=..., __a=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:859 #18 std::make_shared<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:875 #19 torch::jit::load (rai=std::shared_ptr<class caffe2::serialize::ReadAdapterInterface> (empty) = {...}, device=device@entry=..., Python Exception <class 'gdb.error'> No type named std::__detail::_Hash_node<struct std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, true>.: extra_files=std::unordered_map with 0 elements) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:474 #20 0x00000000035b1ef6 in torch::jit::load (filename="crash-stoull", device=device@entry=..., Python Exception <class 'gdb.error'> No type named std::__detail::_Hash_node<struct std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, true>.: extra_files=std::unordered_map with 0 elements) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:444 #21 0x00000000035b1d22 in torch::jit::load (filename="", device=device@entry=...) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:424 #22 0x00000000008f9be3 in main (argc=1, argv=0x7ffcd160f9f8) at /pytorch_master/binaries/aot_model_compiler.cc:128 ``` Pull Request resolved: pytorch#77557 Approved by: https://github.com/Gamrix

### Summary: This PR implements QAT for APoT FakeQuant. It runs QAT with FX graph mode quantized models (Resnet-18 pre-trained model, full ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. It also refactors the APoT PTQ module `apot_fx_graph_mode_ptq.py` (previously `fx_graph_mode_apot.py`) such that shared helper functions between PTQ and QAT are in a separate file `quantization_util.py`. Model #2 (uniformly quantized activation, APoT quantized weight) shows comparable accuracy compared to model #1 (uniformly quantized activation, APoT quantized weight) for 8-bit and significant accuracy improvement for 4-bit (see "Accuracy Stats" section below). ### Test Plan: Run QAT models with: `python test/quantization/core/experimental/apot_qat.py` Run PTQ models with: `python test/quantization/core/experimental/apot_ptq.py` ### Accuracy Stats 8-bit (Uniform int8, APoT b = 8 k = 2) Model #1: Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.67% (Top-1), 89.04% (Top-5) Model #2: Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.72% (Top-1), 89.06% (Top-5) 4-bit (Uniform int4, APoT b = 4 k = 2) Model #1: Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 46.85% (Top-1), 72.85% (Top-5) Model #2: Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 66.45% (Top-1), 86.23% (Top-5) Pull Request resolved: pytorch#83282 Approved by: https://github.com/jerryzh168

Summary: The deleter of the operator's unique_ptr doesn't get called unless the unique_ptr is created after the op has been created This fixes the problem reported in https://fb.workplace.com/groups/pytorch.edge.users/posts/1210708329799458/ Test Plan: # Testing memory leak fix **With test code added in D41487340:** ``` cd ~/fbsource/xplat buck run caffe2/aten/src/ATen/native/quantized/cpu/qsoftmax_test:qsoftmax_test ``` Before this diff: ``` ==2060866==ERROR: LeakSanitizer: detected memory leaks Direct leak of 608 byte(s) in 1 object(s) allocated from: #0 0x41bcd27 in calloc (/data/users/salilsdesai/fbsource/buck-out/gen/aab7ed39/xplat/caffe2/aten/src/ATen/native/quantized/cpu/qsoftmax_test/qsoftmax_test+0x41bcd27) #1 0x405b692 in pytorch_qnnp_create_softargmax_nc_q8 xplat/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/src/softargmax.c:77 Indirect leak of 1024 byte(s) in 1 object(s) allocated from: #0 0x41bcb7f in malloc (/data/users/salilsdesai/fbsource/buck-out/gen/aab7ed39/xplat/caffe2/aten/src/ATen/native/quantized/cpu/qsoftmax_test/qsoftmax_test+0x41bcb7f) #1 0x405b6a8 in pytorch_qnnp_create_softargmax_nc_q8 xplat/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack/src/softargmax.c:85 SUMMARY- AddressSanitizer: 1632 byte(s) leaked in 2 allocation(s). ``` After this diff: - No errors ___ # Testing op correctness ``` cd ~/fbsource/fbcode buck test caffe2/test/quantization:quantization -- test_qsoftmax ``` Passes - https://www.internalfb.com/intern/testinfra/testconsole/testrun/2814749908834332/ Differential Revision: D41487341 Pull Request resolved: pytorch#89544 Approved by: https://github.com/mcr229

jjsjann123 added 3 commits February 4, 2020 17:45

it builds!

ea80847

codegen compiled and working (with hardcoded stuff)

adab80c

multiple template parameter

cea8b4a

csarofeen merged commit bc01d3b into csarofeen:pytorch_fusion Feb 5, 2020

rdspring1 added a commit that referenced this pull request Oct 13, 2020

Fix reorderGroupBreadthFirst #1

3f6bd98

Handle expressions with missing inputs as needed

tlemo mentioned this pull request Jan 7, 2021

[CUDA graphs] [JIT] Capture-safe RNG in nvfuser #593

Merged

csarofeen mentioned this pull request Mar 24, 2021

Misaligned Vectorization Support #731

Merged

4 tasks

naoyam mentioned this pull request Nov 30, 2021

[DRAFT] Initial mma integration #1274

Closed

7 tasks

naoyam mentioned this pull request Feb 15, 2022

Mma operator and volta mma integration #1439

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throwing the simple_ir into the fuser branch.#1

Throwing the simple_ir into the fuser branch.#1
csarofeen merged 3 commits intocsarofeen:pytorch_fusionfrom
jjsjann123:pytorch_fusion_cs_repo

jjsjann123 commented Feb 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jjsjann123 commented Feb 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants