Skip to content

Update nadam.py#1

Merged
peterjc123 merged 2 commits intopeterjc123:masterfrom
Jiaming-Liu:patch-2
May 1, 2017
Merged

Update nadam.py#1
peterjc123 merged 2 commits intopeterjc123:masterfrom
Jiaming-Liu:patch-2

Conversation

@Jiaming-Liu
Copy link
Copy Markdown

Save memory by exploiting in-place operations.

Save memory by exploiting in-place operations.
Reformats my PR
@Jiaming-Liu
Copy link
Copy Markdown
Author

Tested on MNIST. Works slightly better (mysteriously).
capture2
Green: before patch; Blue: after patch

@peterjc123
Copy link
Copy Markdown
Owner

Thank you for your modification and your pictures.

@peterjc123 peterjc123 merged commit f6129b5 into peterjc123:master May 1, 2017
happynear pushed a commit to happynear/pytorch that referenced this pull request May 30, 2017
peterjc123 added a commit that referenced this pull request Aug 30, 2017
peterjc123 added a commit that referenced this pull request Sep 4, 2017
peterjc123 added a commit that referenced this pull request Oct 4, 2017
peterjc123 added a commit that referenced this pull request Nov 7, 2017
peterjc123 pushed a commit that referenced this pull request Nov 8, 2017
peterjc123 pushed a commit that referenced this pull request Nov 8, 2017
peterjc123 added a commit that referenced this pull request Nov 14, 2017
peterjc123 added a commit that referenced this pull request Nov 14, 2017
* Fix CUDA 9 builds for Windows

* Add msvc conditional flag

* minor bug fix

* minor bugs #1
peterjc123 pushed a commit that referenced this pull request Jan 30, 2018
Currently, index operation kernels work in "source/destination index-major
order".  (E.g., if thread count equals slice size, each thread will process
slice #0 in lockstep, and then slice #1, and so on.)

However, when elements inside each "slice" is separated by large strides (e.g.,
selecting columns of a matrix), it is better to switch to "elementInSlice-major
order".  For example, each thread can process element #0 of every slice, and
then element #1 of every slice, and so on.
peterjc123 pushed a commit that referenced this pull request Feb 18, 2018
Currently, index operation kernels work in "source/destination index-major
order".  (E.g., if thread count equals slice size, each thread will process
slice #0 in lockstep, and then slice #1, and so on.)

However, when elements inside each "slice" is separated by large strides (e.g.,
selecting columns of a matrix), it is better to switch to "elementInSlice-major
order".  For example, each thread can process element #0 of every slice, and
then element #1 of every slice, and so on.
peterjc123 pushed a commit that referenced this pull request Sep 13, 2018
…orms we care about. (pytorch#11394)

Summary:
While the use of memcpy as part of the byte swapping sequence looks funky, all major
compilers recognize and optimize this pattern reliably, resulting in essentially
optimal code generation.

For example, decodeUInt32LE goes from this on iOS arm64:
>         ldrb    w8, [x0, #3]
>         ldrb    w9, [x0, #2]
>         bfi     w8, w9, #8, #8
>         ldrb    w9, [x0, #1]
>         bfi     w8, w9, #16, #8
>         ldrb            w9, [x0]
>         bfi     w8, w9, #24, #8
>         mov      x0, x8
>         ret

To this:
>         ldr             w8, [x0]
>         rev     w0, w8
>         ret
Pull Request resolved: pytorch#11394

Reviewed By: SsnL

Differential Revision: D9728659

Pulled By: resistor

fbshipit-source-id: 9afbd4adfad1d1fb7b01f1179e6707ee21fa726f
peterjc123 pushed a commit that referenced this pull request Nov 20, 2018
pytorch#14040)

Summary:
…2164)"

This reverts commit 4b7c615.
Pull Request resolved: pytorch#14040

Differential Revision: D13089531

Pulled By: yinghai

fbshipit-source-id: 2114b36111dab6f179c02921bbc9bd382ef461bf
peterjc123 pushed a commit that referenced this pull request Feb 26, 2019
Summary:
Currently there is a mismatch in naming between Python BatchNorm `running_var` and C++ BatchNorm `running_variance`, which causes JIT model parameters loading to fail (pytorch/vision#728 (comment)):
```
terminate called after throwing an instance of 'c10::Error'
  what():  No such serialized tensor 'running_variance' (read at /home/shahriar/Build/pytorch/torch/csrc/api/src/serialize/input-archive.cpp:27)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x85 (0x7f2d92d32f95 in /usr/local/lib/libc10.so)
frame #1: torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool) + 0xdeb (0x7f2d938551ab in /usr/local/lib/libtorch.so.1)
frame #2: torch::nn::Module::load(torch::serialize::InputArchive&) + 0x98 (0x7f2d9381cd08 in /usr/local/lib/libtorch.so.1)
frame #3: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #4: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #5: torch::nn::operator>>(torch::serialize::InputArchive&, std::shared_ptr<torch::nn::Module> const&) + 0x32 (0x7f2d9381c7b2 in /usr/local/lib/libtorch.so.1)
frame #6: <unknown function> + 0x2b16c (0x5645f4d1916c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #7: <unknown function> + 0x27a3c (0x5645f4d15a3c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #8: <unknown function> + 0x2165c (0x5645f4d0f65c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #9: <unknown function> + 0x1540b (0x5645f4d0340b in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #10: __libc_start_main + 0xf3 (0x7f2d051dd223 in /usr/lib/libc.so.6)
frame #11: <unknown function> + 0x1381e (0x5645f4d0181e in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
```
Renaming C++ BatchNorm `running_variance` to `running_var` should fix this problem.

This is a BC-breaking change, but it should be easy for end user to rename `running_variance` to `running_var` in their call sites.
Pull Request resolved: pytorch#17371

Reviewed By: goldsborough

Differential Revision: D14172775

Pulled By: yf225

fbshipit-source-id: b9d3729ec79272a8084269756f28a8f7c4dd16b6
peterjc123 pushed a commit that referenced this pull request Apr 9, 2019
Summary:
Tracing models which attempts to return this in-place value doesn't turn out well.

I haven't run any tests to confirm the results to be honest, but regardless of the outcome, the operation happens in-place, so it should work as before.

Sample output from traced model attempting to set `max_norm` on `Embedding`:
```
a leaf Variable that requires grad has been used in an in-place operation. (check_inplace at /pytorch/torch/csrc/autograd/VariableTypeUtils.h:49)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f0ecc5cc021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f0ecc5cb8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x38ab2f (0x7f0ecb55ab2f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #3: torch::autograd::VariableType::embedding_renorm_(at::Tensor&, at::Tensor const&, double, double) const + 0x76 (0x7f0ecb5b5966 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #4: <unknown function> + 0x56c958 (0x7f0ecb73c958 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x672286 (0x7f0ecb842286 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #6: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x22 (0x7f0ecb83d842 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #7: <unknown function> + 0x65c6ac (0x7f0ecb82c6ac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #8: <unknown function> + 0x3c8ab4 (0x7f0f06bc0ab4 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x3ad2c3 (0x7f0f06ba52c3 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x11663e (0x7f0f0690e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #39: python_call + 0x11 (0x5563c3c521c1 in uwsgi)
frame #40: uwsgi_request_wsgi + 0x100 (0x5563c3c54410 in uwsgi)
frame #41: wsgi_req_recv + 0xac (0x5563c3becabc in uwsgi)
frame #42: simple_loop_run + 0xc4 (0x5563c3c35be4 in uwsgi)
frame #43: simple_loop + 0x10 (0x5563c3c35a00 in uwsgi)
frame #44: uwsgi_ignition + 0x241 (0x5563c3c3a3a1 in uwsgi)
frame #45: uwsgi_worker_run + 0x275 (0x5563c3c3ec35 in uwsgi)
frame #46: <unknown function> + 0x8f22c (0x5563c3c3f22c in uwsgi)
frame #47: <unknown function> + 0x3c13e (0x5563c3bec13e in uwsgi)
frame #48: __libc_start_main + 0xf1 (0x7f0f138922e1 in /lib/x86_64-linux-gnu/libc.so.6)
frame #49: _start + 0x2a (0x5563c3bec16a in uwsgi)
:
operation failed in interpreter:
op_version_set = 0
def forward(self,
    input_1: Tensor) -> Tensor:
  _0 = torch.norm(self.item_embedding.weight, 2, 1, True)
  _1 = torch.div(self.item_embedding.weight, _0)
  m_weight = torch.t(_1)
  input_2 = torch.contiguous(input_1)
  weight_1 = torch.embedding_renorm_(self.item_embedding.weight, input_2, 1., 2.)
             ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  x = torch.embedding(weight_1, input_2, -1, False, False)
  input_3 = torch.div(x, torch.norm(x, 2, 2, True))
  max_batch_size = ops.prim.NumToTensor(torch.size(input_3, 0))
  hx = torch.zeros([2, int(max_batch_size), 70], dtype=6, layout=0, device=torch.device("cpu"))
  _2 = [self.lstm_layer.weight_ih_l0, self.lstm_layer.weight_hh_l0, self.lstm_layer.weight_ih_l1, self.lstm_layer.weight_hh_l1]
  input_4, _3, _4 = torch.lstm(input_3, [hx, hx], _2, False, 2, 0.10000000000000001, False, False, True)
  input = torch.matmul(input_4, torch.t(self.rnn2item.weight))
  tastevec = torch.div(input, torch.norm(input, 2, 2, True))
  outputs = torch.matmul(tastevec, m_weight)
```
Pull Request resolved: pytorch#18684

Differential Revision: D14782041

Pulled By: ezyang

fbshipit-source-id: 7b2fc19b7d5b6600263644498bb728319a19f39d
peterjc123 pushed a commit that referenced this pull request Jun 6, 2019
Summary:
We have encountered `std::bad_cast` error when running PyTorch binary built with cxx11 abi on CentOS7, stack trace:
```
#0  0x00007fec10160207 in raise () from /lib64/libc.so.6
#1  0x00007fec101618f8 in abort () from /lib64/libc.so.6
#2  0x00007fec015767d5 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3  0x00007fec01574746 in ?? () from /lib64/libstdc++.so.6
#4  0x00007fec01574773 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007fec01574993 in __cxa_throw () from /lib64/libstdc++.so.6
#6  0x00007fec015c94d2 in std::__throw_bad_cast() () from /lib64/libstdc++.so.6
#7  0x00007feb2ab3c2d7 in std::__cxx11::numpunct<char> const& std::use_facet<std::__cxx11::numpunct<char> >(std::locale const&) ()
   from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
#8  0x00007feb28643d62 in torch::jit::script::strtod_c(char const*, char**) () from /root/.local/lib/python2.7/site-packages/torch/lib/libcaffe2.so
```

We are suspecting this line will get compiled to gcc abi dependent symbol:
```
char decimal_point = std::use_facet<std::numpunct<char>>(std::locale()).decimal_point();
```
Pull Request resolved: pytorch#21293

Differential Revision: D15609910

Pulled By: bddppq

fbshipit-source-id: e247059729863868e4b36d6fec4fcbc36fbc4bb1
peterjc123 pushed a commit that referenced this pull request Jun 7, 2019
Summary:
Turing GPUs (compute capability 7.5) require CUDA10 to work properly.
We've seen some issues for these GPUs using PyTorch binaries with CUDA9 or older:
[Discussion Board #1](https://discuss.pytorch.org/t/cudnn-status-execution-failed-error/38575)
[Discussion Board #2](https://discuss.pytorch.org/t/cublas-runtime-error-on-gpu-running-but-works-on-cpu/46545/6)

Tested on using CUDA9 with an RTX 2080Ti.
Pull Request resolved: pytorch#21468

Differential Revision: D15696170

Pulled By: ezyang

fbshipit-source-id: ed43f4e4948d3f97ec8e7d7952110cbbfeafef2a
peterjc123 pushed a commit that referenced this pull request May 6, 2020
Summary:
Pull Request resolved: pytorch#37101

Fixes pytorch#36954.

The basic concept is to streamline the process of rethrowing
c10::Error with extra error information.  This is in a few
steps:

- I completely remodeled the Error data type and the internal
  invariants.  Instead of manually adding in newlines, the
  message stack formatting process is responsible for inserting
  newlines and spacing as necessary.  Call sites are then
  modified to respect the new API model.
- TORCH_RETHROW macro is added, which adds context to an error
  message and then rethrows it.

New internal assert failure looks like:

```
0 INTERNAL ASSERT FAILED at ../c10/test/util/exception_test.cpp:64, please report a bug to PyTorch.
Exception raised from TestBody at ../c10/test/util/exception_test.cpp:64 (most recent call first):
frame #0: <unknown function> + 0x6aab9 (0x7ff611d3aab9 in /data/users/ezyang/pytorch-tmp/build/lib/libc10.so)
frame #1: ...
```

Error message with context looks like:

```
This is an error
  This is context 1
  This is context 2
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21202891

Pulled By: ezyang

fbshipit-source-id: 361cadd16bc52e5886dba08e79277771ada76169
peterjc123 pushed a commit that referenced this pull request Oct 31, 2020
Summary:
Pull Request resolved: pytorch#46966

These tests had false positives in TSAN for modifying thread local
variables:

```
WARNING: ThreadSanitizer: data race (pid=5364)
  Write of size 8 at 0x7b2c0004ff70 by thread T2:
    #0 free <null> (libtools_build_sanitizers_tsan-py.so+0xde6ad)
    #1 __GI__dl_deallocate_tls

  Previous write of size 1 at 0x7b2c0004ff71 by thread T3:
    #0 at::GradMode::set_enabled(bool) caffe2/aten/src/ATen/core/grad_mode.cpp:20 (libcaffe2_ATen-core.so+0x40e013)
    #1 torch::autograd::set_grad_enabled(_object*, _object*) caffe2/torch/csrc/autograd/init.cpp:143 (libcaffe2__C_impl_cuda.so+0x115ef0e)
    #2 _PyMethodDef_RawFastCallKeywords

  Thread T3 (tid=5385, finished) created by main thread at:
    #0 pthread_create <null> (libtools_build_sanitizers_tsan-py.so+0xc5a86)
    #1 PyThread_start_new_thread
```
ghstack-source-id: 115330433

Test Plan: waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D24584411

fbshipit-source-id: e35f704dfcb7b161a13a4902beaf8b1e41ccd596
pull Bot pushed a commit that referenced this pull request Sep 11, 2021
…ytorch#63339)

Summary:
Pull Request resolved: pytorch#63339

# Context
https://fb.workplace.com/groups/pytorch.dev/permalink/900474523864362/?comment_id=901125403799274&reply_comment_id=905023386742809

##### WHAT IS A STACK TRACE?
A stack trace (also called stack backtrace or stack traceback) is a report of the active stack frames at a certain point in time during the execution of a program.

Typically when an exception is thrown, one would expect to see the code (file:line) that threw the exception, and every intermediate frame up to and including the main function.

We are enabling android stack trace to help debugging on android devices.

Test Plan:
## Steps to test
```
buck build fbsource//xplat/caffe2/mode/aibench_pytorch_android -c pt.enable_qpl=0 -c pt.has_backtraces=1 fbsource//xplat/caffe2/fb/lite_predictor:lite_predictorAndroid#android-x86_64

one_world android emulator android-28

adb push ~/fbsource/buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictorAndroid#android-x86_64 /data/local/tmp

cd /data/local/tmp
./lite_predictorAndroid#android-x86_64

./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true
```

## See how model file is not found stack traces is:

### before
```
./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true

Run with 2 threads
Run with 2 threads
Loading model...
terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first):
(no backtrace available)
Aborted
```

### after
```
134|generic_x86_64:/data/local/tmp $ ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true
Run with 2 threads
Run with 2 threads
Loading model...
terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first):
 frame #0       c10::get_backtrace(unsigned long, unsigned long, bool)[0x59494274f10e]
 frame #1       [0x5949427b1eee]
 frame #2       [0x5949427b1eb2]
 frame #3       [0x5949427b1cdc]
 frame #4       std::__ndk1::function<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > ()>::operator()() const[0x5949427afc34]
 frame #5       c10::Error::Error(c10::SourceLocation, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >)[0x5949427b05b1]
 frame #6       c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949427aca5f]
 frame #7       caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b37b2]
 frame #8       caffe2::serialize::FileAdapter::FileAdapter(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b3903]
 frame #9       torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>, std::__ndk1::unordered_map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::hash<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::equal_to<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > > >&)[0x5949422737bd]
 frame #10      torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>)[0x594942273769]
 frame #11      benchmark(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x59494189b21d]
 frame #12      main[0x594941882aff]
 frame #13      __libc_init[0x7b699d08578d]
```

### what we get for os:linux
```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true
Run with 24 threads
Run with 24 threads
Loading model...
terminate called after throwing an instance of 'c10::Error'
  what():  open file failed, file path: ./detect.bc
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first):
frame #0: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb7fe]
frame #1: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb6c6]
frame #2: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x54 (0x20ca4e4 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #3: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x57 (0x20ca9a7 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #4: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7a (0x20c823a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #5: caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x96 (0x206f3d6 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #6: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x42 (0x206f502 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #7: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x30 (0x1be826c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #8: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) + 0x35 (0x1be8214 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #9: benchmark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x16d (0x12093ad in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #10: main + 0x25c (0x11f933c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #11: __libc_start_main + 0x105 (0x7fc7b9f2ed95 in /usr/local/fbcode/platform009/lib/libc.so.6)
frame #12: _start + 0x2a (0x11f902a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)

Aborted (core dumped)
````

Reviewed By: dhruvbird

Differential Revision: D30135947

fbshipit-source-id: f50c634ef4545843305cad4b4a14a8776b1aec76
pull Bot pushed a commit that referenced this pull request Sep 11, 2021
…4332)

Summary:
Pull Request resolved: pytorch#64332

With this diff, if a compiler bug occurs (unlikely, I know!) we'll be able to get a c++ stacktrace leading to the exception, rather than just a terse message.  E.g.,
```
RuntimeError: UNSUPPORTED DTYPE
Exception raised from compilation_error at ../torch/csrc/jit/tensorexpr/exceptions.h:32 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f966659b2eb in /fsx/users/bertrand/c\
onda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x376f099 (0x7f966a195099 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x3763bf5 (0x7f966a189bf5 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #3: torch::jit::tensorexpr::CudaCodeGen::Initialize() + 0xdd8 (0x7f966a193368 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda\
.so)
```

Test Plan: Imported from OSS

Reviewed By: huiguoo

Differential Revision: D30745610

Pulled By: bertmaher

fbshipit-source-id: a1cfaa7364ef4120de834e9cbe57ced1d082ab4e
pull Bot pushed a commit that referenced this pull request Oct 8, 2021
Summary:
Pull Request resolved: pytorch#66009

Fixes
```
test_trace_c10_ops (jit.test_tracer.TestTracer) ... third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374:24: runtime error: applying non-zero offset 4 to null pointer
    #0 0x7f5228f72227 in Eigen::internal::BlockImpl_dense<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, true>::BlockImpl_dense(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374
    #1 0x7f5228f7212c in Eigen::BlockImpl<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, Eigen::Dense>::BlockImpl(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:166
    #2 0x7f5228f720dc in Eigen::Block<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false>::Block(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:142
    #3 0x7f5229b0e059 in Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::FixedBlockXpr<internal::get_fixed_value<int>::value, internal::get_fixed_value<long>::value>::Type Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::block<int, long>(long, long, int, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/../plugins/BlockMethods.h:98
    #4 0x7f5229b0c5ca in caffe2::GenerateProposalsOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/generate_proposals_op.cc:348
```
Also cleans up some data type and const issues around the area.

Test Plan: Sandcastle

Reviewed By: xush6528

Differential Revision: D31343046

fbshipit-source-id: fd9096c8e47a0aad529c72fd313f64ca98dcb80b
pull Bot pushed a commit that referenced this pull request Oct 8, 2021
Summary:
Pull Request resolved: pytorch#66060

Fixes
```
testTumHistoryAdditionalLaser (caffe2.caffe2.fb.layers.tests.tum_history_test.TestTumHistory) ... caffe2/caffe2/operators/concat_split_op.h:363:74: runtime error: applying non-zero offset 8 to null pointer
    #0 0x7f8f39d29795 in caffe2::ConcatOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/concat_split_op.h:363
    #1 0x7f8f39c4978d in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:987
    #2 0x7f8f381fe9c9 in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:67
    #3 0x7f8f38ee488e in caffe2::Workspace::RunNet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) caffe2/caffe2/core/workspace.cc:289
```

Test Plan: Sandcastle

Reviewed By: dzhulgakov, xush6528

Differential Revision: D31366205

fbshipit-source-id: 566aa519677c9d371189e4b1f81d595732861efc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants