Skip to content

Fix the issue with the rapid growth in memory allocations in Binary Ops (#77753)#2

Merged
razarmehr merged 2 commits intomps_masterfrom
razarmehr/binaryops_fixes
May 20, 2022
Merged

Fix the issue with the rapid growth in memory allocations in Binary Ops (#77753)#2
razarmehr merged 2 commits intomps_masterfrom
razarmehr/binaryops_fixes

Conversation

@razarmehr
Copy link

This fix should improve the performance in many benchmarks as well.

This fix should improve the performance in many benchmarks as well
@DmitryRizshkovg
Copy link
Collaborator

Can confirm, on my test model memory allocation now flat during a training.

Copy link
Owner

@kulinseth kulinseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit to be fixed. You can merge it after.

MPSStream* mpsStream,
const Tensor& tensor);
MPSGraphTensorData *getMPSGraphTensorData(MPSGraph* mpsGraph, MPSStream* mpsStream, const Tensor& tensor);
MPSGraphTensorData* newMPSTensorFromScalar(MPSStream* mpsStream, const Scalar& scalar, MPSDataType dataType);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change this getMPSGraphTensorFromScalar to be consistent?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@razarmehr razarmehr merged this pull request into mps_master May 20, 2022
@razarmehr razarmehr deleted the razarmehr/binaryops_fixes branch May 20, 2022 20:12
kulinseth pushed a commit that referenced this pull request May 24, 2022
This makes the rocm jobs run on master-only. We've been battling queue
times for a few months now
(pytorch#73039). So far we have tried
or investigated:
1. Moving distributed builds to master
2. Moving distributed builds to periodic
3. Only running rocm on a specific set of paths
4. Running multiple jobs on a single rocm host.

Unfortunately, we haven't been able to reduce queuing times to good
levels. As a result, ROCm jobs are the "weightiest" job in PR CI, with
an average TTS of 3.3h (see https://hud.pytorch.org/metrics, panel name
"Job time-to-signal, all branches").

There are two things we haven't tried so far:
1. Running "smoke tests" only on PR
2. Switching rocm builds to master

Since #2 is easiest let's give it a try. For now, the policy would be
the same as what we do for other capacity-constrained configurations
(Win and Mac)—run on master only, but revert if there is a breakage
introduced.

[skip ci]

Pull Request resolved: pytorch#77989

Approved by: https://github.com/malfet, https://github.com/janeyx99
kulinseth pushed a commit that referenced this pull request May 24, 2022
…78136)

This prevents `import torch` accidentally crash on machines with no metal devices

Should prevent crashes reported in pytorch#77662 (comment) and https://github.com/pytorch/functorch/runs/6560056366?check_suite_focus=true

Backtrace to the crash:
```
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff7202be57 libobjc.A.dylib`objc_msgSend + 23
    frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436
    frame #2: 0x000000010fda011d libtorch_cpu.dylib`_GLOBAL__sub_I_MPSAllocator.mm + 125
    frame #3: 0x000000010ada81e3 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535
    frame #4: 0x000000010ada85ee dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40(lldb) up
frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436
libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl:
->  0x10fd9f524 <+436>: movq   %rax, 0x1b0(%rbx)
    0x10fd9f52b <+443>: movw   $0x0, 0x1b8(%rbx)
    0x10fd9f534 <+452>: addq   $0x8, %rsp
    0x10fd9f538 <+456>: popq   %rbx
(lldb) disassemble
 ...
    0x10fd9f514 <+420>: movq   0xf19ad15(%rip), %rsi     ; "maxBufferLength"
    0x10fd9f51b <+427>: movq   %r14, %rdi
    0x10fd9f51e <+430>: callq  *0xeaa326c(%rip)          ; (void *)0x00007fff7202be40: objc_msgSend
```

which corresponds to `[m_device maxBufferLength]` call, where `m_device` is not initialized in
https://github.com/pytorch/pytorch/blob/2ae3c59e4bcb8e6e75b4a942cacc2d338c88e609/aten/src/ATen/mps/MPSAllocator.h#L171

Pull Request resolved: pytorch#78136
Approved by: https://github.com/seemethere
kulinseth pushed a commit that referenced this pull request May 25, 2022
… of libtorch_python (pytorch#78028)

Summary:
This moves torch::class_<WorkerInfo> into `rpc_agent.cpp` so it gets registered in libtorch instead of libtorch_python. This is intermediate work to getting torch::deploy to load an unmodified copy of libtorch. Current RPC is incompatible due to duplicate registrations.

```
unknown file: Failure
C++ exception with description "Exception Caught inside torch::deploy embedded library:
Custom class with name __torch__.torch.classes.dist_rpc.WorkerInfo is already registered. Ensure that registration with torch::class_ is only called once.
Exception raised from registerCustomClass at ../aten/src/ATen/core/custom_class.cpp:61 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f3bd9adb92e in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7f3bd9ab7068 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: torch::registerCustomClass(std::shared_ptr<c10::ClassType>) + 0x110 (0x7f3bc2258980 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&) + 0x3b9 (0x7f3bc225a419 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #4: [0x7f3ba45cfea1]
frame #5: <unknown function> + 0x1b5334 (0x5652bdab9334 in ./test_deploy)
frame #6: <unknown function> + 0x1b4f3e (0x5652bdab8f3e in ./test_deploy)
frame #7: <unknown function> + 0x1b519b (0x5652bdab919b in ./test_deploy)
frame #8: loadSearchFile(char const*) + 0x23e (0x7f3ba62f37f8 in /tmp/torch_deploy9ATEFg)
frame #9: deploy_set_self + 0x51 (0x7f3ba62f38f9 in /tmp/torch_deploy9ATEFg)
frame #10: torch::deploy::Interpreter::Interpreter(torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>) + 0x274 (0x5652bdaaa790 in ./test_deploy)
frame #11: void __gnu_cxx::new_allocator<torch::deploy::Interpreter>::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x81 (0x5652bdaaf58b in ./test_deploy)
frame #12: void std::allocator_traits<std::allocator<torch::deploy::Interpreter> >::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(std::allocator<torch::deploy::Interpreter>&, torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x4a (0x5652bdaae320 in ./test_deploy)
frame #13: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::_M_realloc_insert<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(__gnu_cxx::__normal_iterator<torch::deploy::Interpreter*, std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> > >, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xee (0x5652bdaae4a0 in ./test_deploy)
frame #14: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::emplace_back<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xb6 (0x5652bdaad258 in ./test_deploy)
frame #15: torch::deploy::InterpreterManager::InterpreterManager(unsigned long, std::shared_ptr<torch::deploy::Environment>) + 0x123 (0x5652bdaa83b1 in ./test_deploy)
frame #16: TorchpyTest_InitTwice_Test::TestBody() + 0x65 (0x5652bda075a9 in ./test_deploy)
frame #17: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x65 (0x5652bda944b7 in ./test_deploy)
frame #18: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x5a (0x5652bda8cfe7 in ./test_deploy)
frame #19: testing::Test::Run() + 0x100 (0x5652bda68622 in ./test_deploy)
frame #20: testing::TestInfo::Run() + 0x10f (0x5652bda68fb3 in ./test_deploy)
frame #21: testing::TestSuite::Run() + 0x121 (0x5652bda6980d in ./test_deploy)
frame #22: testing::internal::UnitTestImpl::RunAllTests() + 0x38e (0x5652bda756e6 in ./test_deploy)
frame #23: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x65 (0x5652bda9586b in ./test_deploy)
frame #24: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x5a (0x5652bda8e0f7 in ./test_deploy)
frame #25: testing::UnitTest::Run() + 0xc9 (0x5652bda73fd1 in ./test_deploy)
frame #26: RUN_ALL_TESTS() + 0x11 (0x5652bda169fa in ./test_deploy)
frame #27: main + 0x27 (0x5652bda10ce2 in ./test_deploy)
frame #28: <unknown function> + 0x2d310 (0x7f3bc0431310 in /usr/lib/libc.so.6)
frame #29: __libc_start_main + 0x81 (0x7f3bc04313c1 in /usr/lib/libc.so.6)
frame #30: _start + 0x25 (0x5652bda063b5 in ./test_deploy)
```

Test Plan: CI

Differential Revision: D36564258

Pull Request resolved: pytorch#78028
Approved by: https://github.com/rohan-varma
kulinseth pushed a commit that referenced this pull request Jun 7, 2022
… to conform with non-quantized countertpart filenames

Summary:
Names of analogous files in quantized directory (previously snake case) were inconsistent with
their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes
all files in quantized (and sub-directories) dir to have pascal case.

`aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet
because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR.

```
terminate called after throwing an instance of 'c10::Error'
  what():  Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types.
Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so)
frame #2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2)
..........................truncated.............
```

Test Plan:
```
python test/test_quantization.py
```

Pull Request resolved: pytorch#77037

Approved by: https://github.com/jerryzh168
kulinseth pushed a commit that referenced this pull request Jun 10, 2022
…78136) (pytorch#78204)

This prevents `import torch` accidentally crash on machines with no metal devices

Should prevent crashes reported in pytorch#77662 (comment) and https://github.com/pytorch/functorch/runs/6560056366?check_suite_focus=true

Backtrace to the crash:
```
(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff7202be57 libobjc.A.dylib`objc_msgSend + 23
    frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436
    frame #2: 0x000000010fda011d libtorch_cpu.dylib`_GLOBAL__sub_I_MPSAllocator.mm + 125
    frame #3: 0x000000010ada81e3 dyld`ImageLoaderMachO::doModInitFunctions(ImageLoader::LinkContext const&) + 535
    frame #4: 0x000000010ada85ee dyld`ImageLoaderMachO::doInitialization(ImageLoader::LinkContext const&) + 40(lldb) up
frame #1: 0x000000010fd9f524 libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl() + 436
libtorch_cpu.dylib`at::mps::HeapAllocator::MPSHeapAllocatorImpl::MPSHeapAllocatorImpl:
->  0x10fd9f524 <+436>: movq   %rax, 0x1b0(%rbx)
    0x10fd9f52b <+443>: movw   $0x0, 0x1b8(%rbx)
    0x10fd9f534 <+452>: addq   $0x8, %rsp
    0x10fd9f538 <+456>: popq   %rbx
(lldb) disassemble
 ...
    0x10fd9f514 <+420>: movq   0xf19ad15(%rip), %rsi     ; "maxBufferLength"
    0x10fd9f51b <+427>: movq   %r14, %rdi
    0x10fd9f51e <+430>: callq  *0xeaa326c(%rip)          ; (void *)0x00007fff7202be40: objc_msgSend
```

which corresponds to `[m_device maxBufferLength]` call, where `m_device` is not initialized in
https://github.com/pytorch/pytorch/blob/2ae3c59e4bcb8e6e75b4a942cacc2d338c88e609/aten/src/ATen/mps/MPSAllocator.h#L171

Pull Request resolved: pytorch#78136
Approved by: https://github.com/seemethere

Co-authored-by: Nikita Shulga <nshulga@fb.com>
kulinseth pushed a commit that referenced this pull request Jun 21, 2022
kulinseth pushed a commit that referenced this pull request Jun 21, 2022
…ops to use method overloads""

This reverts commit f3665dd.

Reverted pytorch#79819 on behalf of https://github.com/malfet due to land raced with softshrink refs
kulinseth pushed a commit that referenced this pull request Jul 11, 2022
…ytorch#81031)

Re-attempting after original PR pytorch#79596 was reverted due to causing ROCm build failures
Pull Request resolved: pytorch#81031
Approved by: https://github.com/jeffdaily, https://github.com/malfet
pytorchmergebot pushed a commit that referenced this pull request Jul 28, 2022
### Summary:
This PR implements PTQ for APoT FakeQuant. It runs models (Resnet-18 pre-trained model, ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight.

According to the collected accuracy stats, model #2 (uniform activation and APoT weight) appears to have a slight improvement in accuracy compared to model #1 (uniform activation and uniform weight) for 8-bit and significant improvement for 4-bit (see "Accuracy Stats" section below).

### Test Plan:
Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py`

### Accuracy Stats:
8-bit (Uniform int8, APoT b = 8 k = 2)

**Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 64.43% (Top-1), 85.62% (Top-5)

**Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 64.51% (Top-1), 85.78% (Top-5)

**Model #3:** APoT activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 64.32% (Top-1), 85.78% (Top-5)

4-bit (Uniform int4, APoT b = 4 k = 2)

**Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 45.63% (Top-1), 71.96% (Top-5)

**Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 64.24% (Top-1), 85.56% (Top-5)

**Model #3:** APoT activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 45.40% (Top-1), 76.21% (Top-5)

**Full Precision model (FX Graph Mode quantized)**
Evaluation accuracy on test dataset: 69.76% (Top-1), 89.08% (Top-5)

**Eager mode quantized model**
Evaluation accuracy on test dataset: 69.49% (Top-1), 88.90% (Top-5)
Pull Request resolved: pytorch#81040
Approved by: https://github.com/jerryzh168
kulinseth pushed a commit that referenced this pull request Aug 16, 2022
Hi!

I was playing with libfuzzer and found bug when loading a model from file via `torch::jit::load` function.
There is an unhandled exception in caffe2/serialize when calling a `stoull` function on unsanitized version string.

The bug can be reproduced with `aot_model_compiler` binary:
```
aot_model_compiler --model=crash-stoull --model_name=name --model_version=1 --input_dims='1,3,224,224;2,2' --input_types='float;float'
```

Crash file is provided in [crash.zip](https://github.com/pytorch/pytorch/files/8701504/crash.zip).

gdb output:
```
Temporary breakpoint 1, main (argc=6, argv=0x7ffcd160f9f8) at /pytorch_master/binaries/aot_model_compiler.cc:87
87	      "Run NNC AOT compiler for pytorch model. Example usage:\n"
(gdb) c
Continuing.
terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoull

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fa637f16859 in __GI_abort () at abort.c:79
#2  0x00007fa6381c1911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fa6381cd38c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fa6381cd3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fa6381cd6a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fa6381c42ce in std::__throw_invalid_argument(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x000000000247d567 in __gnu_cxx::__stoa<unsigned long long, unsigned long long, char, int> (__str=0x7ffcd160f228 "ZZ", __idx=0x0, __base=10, __convf=<optimized out>, __name=<optimized out>)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/string_conversions.h:83
#8  std::__cxx11::stoull (__str="ZZ", __idx=0x0, __base=10) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/basic_string.h:6577
#9  caffe2::serialize::PyTorchStreamReader::init (this=this@entry=0x8c11ce0) at /pytorch_master/caffe2/serialize/inline_container.cc:145
#10 0x000000000247d9c7 in caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader (this=0x8c11ce0, in=std::shared_ptr<class caffe2::serialize::ReadAdapterInterface> (empty) = {...})
    at /pytorch_master/caffe2/serialize/inline_container.cc:88
#11 0x00000000035b7ba4 in __gnu_cxx::new_allocator<caffe2::serialize::PyTorchStreamReader>::construct<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (
    __p=0x2, __args=..., this=<optimized out>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/new_allocator.h:150
#12 std::allocator_traits<std::allocator<caffe2::serialize::PyTorchStreamReader> >::construct<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__a=...,
    __p=0x2, __p@entry=0x8c11ce0, __args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/alloc_traits.h:512
#13 0x00000000035b1988 in std::_Sp_counted_ptr_inplace<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x8c11cd0, __a=..., __args=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:551
#14 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a8, __p=@0x7ffcd160f3a0: 0x10, __args=..., __a=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:683
#15 std::__shared_ptr<caffe2::serialize::PyTorchStreamReader, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a0, __args=..., __tag=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr_base.h:1371
#16 std::shared_ptr<caffe2::serialize::PyTorchStreamReader>::shared_ptr<std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (this=0x7ffcd160f3a0,
    __args=..., __tag=...) at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:408
#17 std::allocate_shared<caffe2::serialize::PyTorchStreamReader, std::allocator<caffe2::serialize::PyTorchStreamReader>, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__args=..., __a=...)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:859
#18 std::make_shared<caffe2::serialize::PyTorchStreamReader, std::shared_ptr<caffe2::serialize::ReadAdapterInterface> > (__args=...)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/shared_ptr.h:875
#19 torch::jit::load (rai=std::shared_ptr<class caffe2::serialize::ReadAdapterInterface> (empty) = {...}, device=device@entry=..., Python Exception <class 'gdb.error'> No type named std::__detail::_Hash_node<struct std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, true>.:
extra_files=std::unordered_map with 0 elements)
    at /pytorch_master/torch/csrc/jit/serialization/import.cpp:474
#20 0x00000000035b1ef6 in torch::jit::load (filename="crash-stoull", device=device@entry=..., Python Exception <class 'gdb.error'> No type named std::__detail::_Hash_node<struct std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, true>.:
extra_files=std::unordered_map with 0 elements) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:444
#21 0x00000000035b1d22 in torch::jit::load (filename="", device=device@entry=...) at /pytorch_master/torch/csrc/jit/serialization/import.cpp:424
#22 0x00000000008f9be3 in main (argc=1, argv=0x7ffcd160f9f8) at /pytorch_master/binaries/aot_model_compiler.cc:128
```

Pull Request resolved: pytorch#77557
Approved by: https://github.com/Gamrix
kulinseth pushed a commit that referenced this pull request Aug 16, 2022
### Summary:
This PR implements QAT for APoT FakeQuant. It runs QAT with FX graph mode quantized models (Resnet-18 pre-trained model, full ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. It also refactors the APoT PTQ module `apot_fx_graph_mode_ptq.py` (previously `fx_graph_mode_apot.py`) such that shared helper functions between PTQ and QAT are in a separate file `quantization_util.py`.

Model #2 (uniformly quantized activation, APoT quantized weight) shows comparable accuracy compared to model #1 (uniformly quantized activation, APoT quantized weight) for 8-bit and significant accuracy improvement for 4-bit (see "Accuracy Stats" section below).

### Test Plan:
Run QAT models with: `python test/quantization/core/experimental/apot_qat.py`
Run PTQ models with: `python test/quantization/core/experimental/apot_ptq.py`

### Accuracy Stats
8-bit (Uniform int8, APoT b = 8 k = 2)

Model #1: Uniform activation, uniform weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 69.67% (Top-1), 89.04% (Top-5)

Model #2: Uniform activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 69.72% (Top-1), 89.06% (Top-5)

4-bit (Uniform int4, APoT b = 4 k = 2)

Model #1: Uniform activation, uniform weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 46.85% (Top-1), 72.85% (Top-5)

Model #2: Uniform activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 66.45% (Top-1), 86.23% (Top-5)
Pull Request resolved: pytorch#83282
Approved by: https://github.com/jerryzh168
DenisVieriu97 added a commit that referenced this pull request Jan 24, 2023
DenisVieriu97 added a commit that referenced this pull request Jan 24, 2023
* Test MPS CI runners

* Cherry pick remaining files

* Enable lintrunner:

* Change lint  runner

* Retrigger checks

* Retrigger checks #2

* Retrigger checks #3

* Retrigger checks #4

* Retrigger checks #5

* Retrigger checks #5

* Retrigger checks #7

* Retrigger checks #8

* Retrigger checks #9

* Retrigger checks #9 (change arch to arm)

* Retrigger checks #10

* Retrigger checks #11

* Retrigger checks #12

* Retrigger checks #13

* Retrigger checks #14

* Retrigger checks #14

* Retrigger checks #15

* Retrigger checks #16

* Retrigger checks #16

* Retrigger checks #17

* Retrigger checks #19

* Retrigger checks #20

* Retrigger checks #21

* Fix lintrunner

* Fix lintrunner

* Remove lint.json
kulinseth pushed a commit that referenced this pull request Feb 6, 2023
* Test MPS CI runners

* Cherry pick remaining files

* Enable lintrunner:

* Change lint  runner

* Retrigger checks

* Retrigger checks #2

* Retrigger checks #3

* Retrigger checks #4

* Retrigger checks #5

* Retrigger checks #5

* Retrigger checks #7

* Retrigger checks #8

* Retrigger checks #9

* Retrigger checks #9 (change arch to arm)

* Retrigger checks #10

* Retrigger checks #11

* Retrigger checks #12

* Retrigger checks #13

* Retrigger checks #14

* Retrigger checks #14

* Retrigger checks #15

* Retrigger checks #16

* Retrigger checks #16

* Retrigger checks #17

* Retrigger checks #19

* Retrigger checks #20

* Retrigger checks #21

* Fix lintrunner

* Fix lintrunner

* Remove lint.json
DenisVieriu97 added a commit that referenced this pull request Feb 10, 2023
kulinseth pushed a commit that referenced this pull request Feb 10, 2023
…pp (pytorch#94295)

Hi!

I've been fuzzing different pytorch modules, and found a few crashes.

Specifically, I'm talking about `schema_type_parser.cpp` and `irparser.cpp`. Inside these files, different standard conversion functions are used (such as `stoll`, `stoi`, `stod`, `stoull`). However, default `std` exceptions, such as `std::out_of_range`, `std::invalid_argument`, are not handled.

Some of the crash-files:

1. [crash-493db74c3426e79b2bf0ffa75bb924503cb9acdc.zip](https://github.com/pytorch/pytorch/files/10237616/crash-493db74c3426e79b2bf0ffa75bb924503cb9acdc.zip) - crash source: schema_type_parser.cpp:272

2. [crash-67bb5d34ca48235687cc056e2cdeb2476b8f4aa5.zip](https://github.com/pytorch/pytorch/files/10237618/crash-67bb5d34ca48235687cc056e2cdeb2476b8f4aa5.zip) - crash source: schema_type_parser.cpp:240

3. [crash-0157bca5c41bffe112aa01f3b0f2099ca4bcc62f.zip](https://github.com/pytorch/pytorch/files/10307970/crash-0157bca5c41bffe112aa01f3b0f2099ca4bcc62f.zip) - crash source: schema_type_parser.cpp:179

4. [crash-430da923e56adb9569362efa7fa779921371b710.zip](https://github.com/pytorch/pytorch/files/10307972/crash-430da923e56adb9569362efa7fa779921371b710.zip) - crash source: schema_type_parser.cpp:196

The provided patch adds exception handlers for `std::invalid_argument` and `std::out_of_range`, to rethrow these exceptions with `ErrorReport`.

### How to reproduce

1. To reproduce the crash, use provided docker: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/blob/master/projects/pytorch/Dockerfile)

2. Build the container: `docker build -t oss-sydr-fuzz-pytorch-reproduce .`

3. Copy crash file to the current directory

5. Run the container: ``docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash``

6. And execute the binary: `/irparser_fuzz /homedir/crash-67bb5d34ca48235687cc056e2cdeb2476b8f4aa5`

After execution completes you will see this error message:

```txt
terminate called after throwing an instance of 'std::out_of_range'
  what():  stoll
```

And this stacktrace:

```asan
==9626== ERROR: libFuzzer: deadly signal
    #0 0x5b4cf1 in __sanitizer_print_stack_trace /llvm-project/compiler-rt/lib/asan/asan_stack.cpp:87:3
    #1 0x529627 in fuzzer::PrintStackTrace() /llvm-project/compiler-rt/lib/fuzzer/FuzzerUtil.cpp:210:5
    #2 0x50f833 in fuzzer::Fuzzer::CrashCallback() /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:233:3
    #3 0x7ffff7c3741f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1441f)
    #4 0x7ffff7a5700a in raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300a)
    #5 0x7ffff7a36858 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22858)
    #6 0x7ffff7e74910  (/lib/x86_64-linux-gnu/libstdc++.so.6+0x9e910)
    #7 0x7ffff7e8038b  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa38b)
    #8 0x7ffff7e803f6 in std::terminate() (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa3f6)
    #9 0x7ffff7e806a8 in __cxa_throw (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa6a8)
    #10 0x7ffff7e7737d in std::__throw_out_of_range(char const*) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xa137d)
    #11 0xbd0579 in long long __gnu_cxx::__stoa<long long, long long, char, int>(long long (*)(char const*, char**, int), char const*, char const*, unsigned long*, int) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/ext/string_conversions.h:86:2
    #12 0xc10f9c in std::__cxx11::stoll(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long*, int) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/basic_string.h:6572:12
    #13 0xc10f9c in torch::jit::SchemaTypeParser::parseRefinedTensor()::$_2::operator()() const::'lambda'()::operator()() const /pytorch_fuzz/torch/csrc/jit/frontend/schema_type_parser.cpp:240:25
    #14 0xc10f9c in void c10::function_ref<void ()>::callback_fn<torch::jit::SchemaTypeParser::parseRefinedTensor()::$_2::operator()() const::'lambda'()>(long) /pytorch_fuzz/c10/util/FunctionRef.h:43:12
    #15 0xbfbb27 in torch::jit::SchemaTypeParser::parseList(int, int, int, c10::function_ref<void ()>) /pytorch_fuzz/torch/csrc/jit/frontend/schema_type_parser.cpp:424:7
    #16 0xc0ef24 in torch::jit::SchemaTypeParser::parseRefinedTensor()::$_2::operator()() const /pytorch_fuzz/torch/csrc/jit/frontend/schema_type_parser.cpp:236:9
    #17 0xc0ef24 in void c10::function_ref<void ()>::callback_fn<torch::jit::SchemaTypeParser::parseRefinedTensor()::$_2>(long) /pytorch_fuzz/c10/util/FunctionRef.h:43:12
    #18 0xbfbb27 in torch::jit::SchemaTypeParser::parseList(int, int, int, c10::function_ref<void ()>) /pytorch_fuzz/torch/csrc/jit/frontend/schema_type_parser.cpp:424:7
    #19 0xbff590 in torch::jit::SchemaTypeParser::parseRefinedTensor() /pytorch_fuzz/torch/csrc/jit/frontend/schema_type_parser.cpp:209:3
    #20 0xc02992 in torch::jit::SchemaTypeParser::parseType() /pytorch_fuzz/torch/csrc/jit/frontend/schema_type_parser.cpp:362:13
    #21 0x9445642 in torch::jit::IRParser::parseVarWithType(bool) /pytorch_fuzz/torch/csrc/jit/ir/irparser.cpp:111:35
    #22 0x944ff4c in torch::jit::IRParser::parseOperatorOutputs(std::vector<torch::jit::VarWithType, std::allocator<torch::jit::VarWithType> >*)::$_0::operator()() const /pytorch_fuzz/torch/csrc/jit/ir/irparser.cpp:138:21
    #23 0x944ff4c in void std::__invoke_impl<void, torch::jit::IRParser::parseOperatorOutputs(std::vector<torch::jit::VarWithType, std::allocator<torch::jit::VarWithType> >*)::$_0&>(std::__invoke_other, torch::jit::IRParser::parseOperatorOutputs(std::vector<torch::jit::VarWithType, std::allocator<torch::jit::VarWithType> >*)::$_0&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14
    #24 0x94463a7 in torch::jit::IRParser::parseList(int, int, int, std::function<void ()> const&) /pytorch_fuzz/torch/csrc/jit/ir/irparser.cpp:498:7
    #25 0x94460a5 in torch::jit::IRParser::parseOperatorOutputs(std::vector<torch::jit::VarWithType, std::allocator<torch::jit::VarWithType> >*) /pytorch_fuzz/torch/csrc/jit/ir/irparser.cpp:137:3
    #26 0x944c1ce in torch::jit::IRParser::parseOperator(torch::jit::Block*) /pytorch_fuzz/torch/csrc/jit/ir/irparser.cpp:384:3
    #27 0x944bf56 in torch::jit::IRParser::parseOperatorsList(torch::jit::Block*) /pytorch_fuzz/torch/csrc/jit/ir/irparser.cpp:362:5
    #28 0x9444f5f in torch::jit::IRParser::parse() /pytorch_fuzz/torch/csrc/jit/ir/irparser.cpp:482:3
    #29 0x94448df in torch::jit::parseIR(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::Graph*, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, torch::jit::Value*, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, torch::jit::Value*> > >&) /pytorch_fuzz/torch/csrc/jit/ir/irparser.cpp:94:5
    #30 0x944526e in torch::jit::parseIR(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::Graph*) /pytorch_fuzz/torch/csrc/jit/ir/irparser.cpp:99:3
    #31 0x5e3ebd in LLVMFuzzerTestOneInput /irparser_fuzz.cc:43:5
    #32 0x510d61 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #33 0x4fac7c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #34 0x5009cb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #35 0x529f62 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #36 0x7ffff7a38082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #37 0x4f559d in _start (/irparser_fuzz+0x4f559d)

```

Following these steps with the remaining crashes will give you almost the same results.
Pull Request resolved: pytorch#94295
Approved by: https://github.com/davidberard98
kulinseth pushed a commit that referenced this pull request Feb 10, 2023
* Test MPS CI runners

* Cherry pick remaining files

* Enable lintrunner:

* Change lint  runner

* Retrigger checks

* Retrigger checks #2

* Retrigger checks #3

* Retrigger checks #4

* Retrigger checks #5

* Retrigger checks #5

* Retrigger checks #7

* Retrigger checks #8

* Retrigger checks #9

* Retrigger checks #9 (change arch to arm)

* Retrigger checks #10

* Retrigger checks #11

* Retrigger checks #12

* Retrigger checks #13

* Retrigger checks #14

* Retrigger checks #14

* Retrigger checks #15

* Retrigger checks #16

* Retrigger checks #16

* Retrigger checks #17

* Retrigger checks #19

* Retrigger checks #20

* Retrigger checks #21

* Fix lintrunner

* Fix lintrunner

* Remove lint.json
DenisVieriu97 added a commit that referenced this pull request Feb 10, 2023
* Enable test modules on MPS and CI runners

* Update lint.yml

* Update comments

* Retrigger CI

* Retrigger CI #2

* Remove comment
DenisVieriu97 added a commit that referenced this pull request Feb 13, 2023
kulinseth pushed a commit that referenced this pull request Mar 2, 2023
* Cancel jobs in progress if a new PR is created

* Cancel jobs in progress if a new PR is created #2

* Cancel jobs in progress if a new PR is created #3

* Cancel jobs in progress if a new PR is created #4
kulinseth pushed a commit that referenced this pull request Mar 3, 2023
DenisVieriu97 added a commit that referenced this pull request Mar 6, 2023
DenisVieriu97 added a commit that referenced this pull request Mar 6, 2023
DenisVieriu97 added a commit that referenced this pull request Mar 7, 2023
DenisVieriu97 added a commit that referenced this pull request Mar 7, 2023
DenisVieriu97 added a commit that referenced this pull request Mar 7, 2023
DenisVieriu97 added a commit that referenced this pull request Mar 9, 2023
* Run torchbench on CI

* Run torchbench on CI #2

* Run torchbench on CI #3

* Run torchbench on CI #4

* Run torchbench on CI #5

* Run torchbench on CI #6

* Run torchbench on CI #7

* Change #1

* Change #2

* Run torchbench on CI #8

* Run torchbench on CI #9

* Install dep

* Change #11

* Change #12

* Change #13

* Change #14

* Change #15

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* More fixes

* Update dependencies

* other fixes

* more fixes

* more fixes

* more fixes

* more fixes #2

* more fixes #2

* more fixes #2

* more fixes

* other fixes

* other fixes

* other fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* additional changes

* other fixes

* other fixes

* additional changes

* other fixes

* other fixes

* other fixes

* other fixes

* other fixes

* Address PR comments

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes
kulinseth pushed a commit that referenced this pull request Mar 14, 2023
Fixes part of pytorch#96414

Replaces any calls to sizes, with sym_sizes. Still seeing an error with the repro script:
``` Bash
Exception raised from sizes_default at /scratch/drisspg/work/pytorch/c10/core/TensorImpl.h:635 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x7d (0x7f697f4a141d in /scratch/drisspg/work/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xdd (0x7f697f49fbcd in /scratch/drisspg/work/pytorch/torch/lib/libc10.so)
frame #2: c10::TensorImpl::sizes_custom() const + 0x95 (0x7f697f4824c5 in /scratch/drisspg/work/pytorch/torch/lib/libc10.so)
frame #3: at::native::empty_like(at::Tensor const&, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x92c (0x7f69809d18ac in /scratch/drisspg/work/pytorch/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x23f5ce7 (0x7f698193bce7 in /scratch/drisspg/work/pytorch/torch/lib/libtorch_cpu.so)
```

still trying to track down this empty call

from the looks of it, might be coming from at::layer_norm?
the BT from lldb is 221 frames however, so lots of noise

Pull Request resolved: pytorch#96674
Approved by: https://github.com/ezyang
razarmehr pushed a commit that referenced this pull request Mar 17, 2023
skotapati pushed a commit that referenced this pull request Mar 29, 2023
* Run torchbench on CI

* Run torchbench on CI #2

* Run torchbench on CI #3

* Run torchbench on CI #4

* Run torchbench on CI #5

* Run torchbench on CI #6

* Run torchbench on CI #7

* Change #1

* Change #2

* Run torchbench on CI #8

* Run torchbench on CI #9

* Install dep

* Change #11

* Change #12

* Change #13

* Change #14

* Change #15

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* More fixes

* Update dependencies

* other fixes

* more fixes

* more fixes

* more fixes

* more fixes #2

* more fixes #2

* more fixes #2

* more fixes

* other fixes

* other fixes

* other fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* additional changes

* other fixes

* other fixes

* additional changes

* other fixes

* other fixes

* other fixes

* other fixes

* other fixes

* Address PR comments

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes
DenisVieriu97 added a commit that referenced this pull request Apr 6, 2023
DenisVieriu97 added a commit that referenced this pull request Apr 6, 2023
kulinseth added a commit that referenced this pull request Apr 7, 2023
* More fixes for strided kernels

* Fix type inference crash

* Address comments

* More fixes

* Remove logs

* Clean up

* More clean up

* More clean up #2

* More clean up #2

* More clean up #3

* Fix metal version

* Use native binary kernels

* Fix build failure & add more ops

* Fix lint

* Fix failing tests

---------

Co-authored-by: Kulin Seth <kulin_seth@apple.com>
skotapati pushed a commit that referenced this pull request Apr 7, 2023
* Cancel jobs in progress if a new PR is created

* Cancel jobs in progress if a new PR is created #2

* Cancel jobs in progress if a new PR is created #3

* Cancel jobs in progress if a new PR is created #4
skotapati pushed a commit that referenced this pull request Apr 7, 2023
* Run torchbench on CI

* Run torchbench on CI #2

* Run torchbench on CI #3

* Run torchbench on CI #4

* Run torchbench on CI #5

* Run torchbench on CI #6

* Run torchbench on CI #7

* Change #1

* Change #2

* Run torchbench on CI #8

* Run torchbench on CI #9

* Install dep

* Change #11

* Change #12

* Change #13

* Change #14

* Change #15

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* More fixes

* Update dependencies

* other fixes

* more fixes

* more fixes

* more fixes

* more fixes #2

* more fixes #2

* more fixes #2

* more fixes

* other fixes

* other fixes

* other fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* additional changes

* other fixes

* other fixes

* additional changes

* other fixes

* other fixes

* other fixes

* other fixes

* other fixes

* Address PR comments

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes

* more fixes
skotapati pushed a commit that referenced this pull request Apr 7, 2023
* More fixes for strided kernels

* Fix type inference crash

* Address comments

* More fixes

* Remove logs

* Clean up

* More clean up

* More clean up #2

* More clean up #2

* More clean up #3

* Fix metal version

* Use native binary kernels

* Fix build failure & add more ops

* Fix lint

* Fix failing tests

---------

Co-authored-by: Kulin Seth <kulin_seth@apple.com>
brkirch pushed a commit to brkirch/pytorch that referenced this pull request Apr 9, 2023
* More fixes for strided kernels

* Fix type inference crash

* Address comments

* More fixes

* Remove logs

* Clean up

* More clean up

* More clean up kulinseth#2

* More clean up kulinseth#2

* More clean up kulinseth#3

* Fix metal version

* Use native binary kernels

* Fix build failure & add more ops

* Fix lint

* Fix failing tests

---------

Co-authored-by: Kulin Seth <kulin_seth@apple.com>
razarmehr pushed a commit that referenced this pull request May 4, 2023
…#94297)

Hi!

I've been fuzzing different pytorch modules, and found a crash inside one of them.

Specifically, I'm talking about a module that processes `script_call` rpc requests and a function `ScriptCall::fromIValues(std::vector<at::IValue>& ivalues)`.

Running this test case causes a crash that occurs when `ivalues.back()` is called [script_call.cpp:90](https://github.com/pytorch/pytorch/blob/abc54f93145830b502400faa92bec86e05422fbd/torch/csrc/distributed/rpc/script_call.cpp#L90). The crash occurs because the vector `ivalues` is empty.

All tests were performed on this pytorch version: [abc54f9](https://github.com/pytorch/pytorch/tree/abc54f93145830b502400faa92bec86e05422fbd)

The provided patch checks if there are enough elements in the ivalues vector.

### How to reproduce

1. To reproduce the crash, use provided docker: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch)

2. Build the container: `docker build -t oss-sydr-fuzz-pytorch-reproduce .`

3. Copy crash file to the current directory:

    - [crash-9f76d4e37a2391136a4ce07d47269db1e063e4b4.zip](https://github.com/pytorch/pytorch/files/10674059/crash-9f76d4e37a2391136a4ce07d47269db1e063e4b4.zip)

4. Run the container: ``docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash``

5. And execute the binary: `/message_deserialize_fuzz /homedir/crash-9f76d4e37a2391136a4ce07d47269db1e063e4b4`

After execution completes you will see this stacktrace:

```asan
AddressSanitizer:DEADLYSIGNAL
=================================================================
==57==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x0000008e7b19 bp 0x7ffd2fdded70 sp 0x7ffd2fddec40 T0)
==57==The signal is caused by a READ memory access.
==57==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
    #0 0x8e7b19 in c10::IValue::isString() const /pytorch_fuzz/aten/src/ATen/core/ivalue.h:639:27
    #1 0x8e7b19 in c10::IValue::toStringRef[abi:cxx11]() const /pytorch_fuzz/aten/src/ATen/core/ivalue_inl.h:2179:3
    #2 0xe04fb58 in torch::distributed::rpc::ScriptCall::fromIValues(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/torch/csrc/distributed/rpc/script_call.cpp:90:53
    #3 0xe0511f0 in torch::distributed::rpc::ScriptCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch_fuzz/torch/csrc/distributed/rpc/script_call.cpp:133:10
    #4 0xe0ff71e in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch_fuzz/torch/csrc/distributed/rpc/utils.cpp:102:14
    #5 0x602a41 in LLVMFuzzerTestOneInput /message_deserialize_fuzz.cc:192:27
    #6 0x52ce61 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #7 0x516d7c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #8 0x51cacb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #9 0x546062 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #10 0x7f41e42a8082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #11 0x51169d in _start (/message_deserialize_fuzz+0x51169d)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /pytorch_fuzz/aten/src/ATen/core/ivalue.h:639:27 in c10::IValue::isString() const
==57==ABORTING
```
Pull Request resolved: pytorch#94297
Approved by: https://github.com/ezyang
razarmehr pushed a commit that referenced this pull request May 4, 2023
…ytorch#94300)

Hi!

I've been fuzzing different pytorch modules, and found a crash inside one of them.

Specifically, I'm talking about a module for unpickling and a function called `Unpickler::readInstruction()`. Running this function with provided crash file results in a crash, which occurs while calling `auto dict = stack_.at(dict_pos).toGenericDict();` [unpickler.cpp:561](https://github.com/pytorch/pytorch/blob/0e94fbc0c8ab1572c88159c1a4c397b6eb824c01/torch/csrc/jit/serialization/unpickler.cpp#L561). The crash occurs, because the index `dict_pos` is out of bounds (which itself happens because the stack size is 0).

Besides this pull-request, there is another one related to unpickler hardening: pytorch#84343

All tests were performed on this pytorch version: [abc54f9](https://github.com/pytorch/pytorch/tree/abc54f93145830b502400faa92bec86e05422fbd)

### How to reproduce

1. To reproduce the crash, use provided docker: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch)

2. Build the container: `docker build -t oss-sydr-fuzz-pytorch-reproduce .`

3. Copy crash file to the current directory:

    - [crash-042dff5e121580425d9d34d0f293918f3c9fbf1e.zip](https://github.com/pytorch/pytorch/files/10674361/crash-042dff5e121580425d9d34d0f293918f3c9fbf1e.zip)

4. Run the container: ``docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash``

5. And execute the binary: `/message_deserialize_sydr /homedir/crash-042dff5e121580425d9d34d0f293918f3c9fbf1e`

After execution completes you will see this error message:

```txt
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551613) >= this->size() (which is 0)
```

And this stacktrace:

```asan
erminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551613) >= this->size() (which is 0)
==39== ERROR: libFuzzer: deadly signal
    #0 0x5d0df1 in __sanitizer_print_stack_trace /llvm-project/compiler-rt/lib/asan/asan_stack.cpp:87:3
    #1 0x545727 in fuzzer::PrintStackTrace() /llvm-project/compiler-rt/lib/fuzzer/FuzzerUtil.cpp:210:5
    #2 0x52b933 in fuzzer::Fuzzer::CrashCallback() /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:233:3
    #3 0x7f9118e0341f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1441f)
    #4 0x7f9118c2300a in raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300a)
    #5 0x7f9118c02858 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22858)
    #6 0x7f9119040910  (/lib/x86_64-linux-gnu/libstdc++.so.6+0x9e910)
    #7 0x7f911904c38b  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa38b)
    #8 0x7f911904c3f6 in std::terminate() (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa3f6)
    #9 0x7f911904c6a8 in __cxa_throw (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa6a8)
    #10 0x7f91190433aa  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xa13aa)
    #11 0x63acdf in std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_range_check(unsigned long) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:1073:4
    #12 0xce8f93e in std::vector<c10::IValue, std::allocator<c10::IValue> >::at(unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:1094:2
    #13 0xce8f93e in torch::jit::Unpickler::readInstruction() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:546:26
    #14 0xce8d527 in torch::jit::Unpickler::run() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:235:27
    #15 0xce8d1c2 in torch::jit::Unpickler::parse_ivalue() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:192:3
    #16 0xcdf0792 in torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch_fuzz/torch/csrc/jit/serialization/pickle.cpp:127:20
    #17 0xcdf104d in torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch_fuzz/torch/csrc/jit/serialization/pickle.cpp:137:10
    #18 0xe0532db in torch::distributed::rpc::ScriptRemoteCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch_fuzz/torch/csrc/distributed/rpc/script_remote_call.cpp:74:16
    #19 0xe0ffa10 in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch_fuzz/torch/csrc/distributed/rpc/utils.cpp:108:14
    #20 0x602a41 in LLVMFuzzerTestOneInput /message_deserialize_fuzz.cc:192:27
    #21 0x52ce61 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #22 0x516d7c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #23 0x51cacb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #24 0x546062 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #25 0x7f9118c04082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #26 0x51169d in _start (/message_deserialize_fuzz+0x51169d)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
```
Pull Request resolved: pytorch#94300
Approved by: https://github.com/malfet, https://github.com/apach301
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants