Skip to content

Add more security checks to fix crashing during torch::jit::load#84343

Closed
apach301 wants to merge 1 commit intopytorch:masterfrom
apach301:add-security-checks
Closed

Add more security checks to fix crashing during torch::jit::load#84343
apach301 wants to merge 1 commit intopytorch:masterfrom
apach301:add-security-checks

Conversation

@apach301
Copy link
Contributor

@apach301 apach301 commented Aug 31, 2022

These changes make loading models with torch::jit::load more stable. Proposed checks fixes multiple segmentation faults and heap buffer overflows that was found during fuzzing pytorch with sydr-fuzz.

Some of the fixed bugs:

  1. Heap buffer overflow that leads to crash
    crash-842314913bf1820ec19cddfbb7400ffdbb756920.zip
  "AsanReport": [
    "==3751==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000033478 at pc 0x0000005f9bc3 bp 0x7fffffff1eb0 sp 0x7fffffff1ea8\n",
    "READ of size 4 at 0x619000033478 thread T0\n",
    "[Detaching after fork from child process 3762]\n",
    "    #0 0x5f9bc2 in c10::IValue::IValue(c10::IValue&&) /pytorch_fuzz/aten/src/ATen/core/ivalue.h:192:43\n",
    "    #1 0x9ecd0a7 in torch::jit::pop(std::vector<c10::IValue, std::allocator<c10::IValue> >&) /pytorch_fuzz/aten/src/ATen/core/stack.h:102:12\n",
    "    #2 0x9ecd0a7 in torch::jit::Unpickler::readInstruction() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:380:17\n",
    "    #3 0x9ecafc7 in torch::jit::Unpickler::run() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:226:27\n",
    "    #4 0x9ecac62 in torch::jit::Unpickler::parse_ivalue() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:183:3\n",
    "    #5 0x9e45996 in torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch_fuzz/torch/csrc/jit/serialization/pickle.cpp:127:20\n",
    "    #6 0x9e4626d in torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch_fuzz/torch/csrc/jit/serialization/pickle.cpp:137:10\n",
  1. Segmentation fault
    crash-e690c58718e88921350562f0b4d9180938145d77.zip
 "AsanReport": [
    "==3744==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x000009122754 bp 0x7fffffff5290 sp 0x7fffffff5270 T0)\n",
    "==3744==The signal is caused by a READ memory access.\n",
    "==3744==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.\n",
    "[Detaching after fork from child process 3763]\n",
    "    #0 0x9122754 in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::retain_() /pytorch_fuzz/c10/util/intrusive_ptr.h:269:54\n",
    "    #1 0x9127929 in c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> >::intrusive_ptr(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch_fuzz/c10/util/intrusive_ptr.h:352:5\n",
    "    #2 0x9127929 in torch::jit::Expr::Expr(c10::intrusive_ptr<torch::jit::Tree, c10::detail::intrusive_target_default_null_type<torch::jit::Tree> > const&) /pytorch_fuzz/torch/csrc/jit/frontend/tree_views.h:269:49\n",
    "    #3 0x91b1bbb in torch::jit::Maybe<torch::jit::Expr>::get() const /pytorch_fuzz/torch/csrc/jit/frontend/tree_views.h:211:12\n",
    "    #4 0x92a8f74 in torch::jit::ScriptTypeParser::parseClassConstant(torch::jit::Assign const&) /pytorch_fuzz/torch/csrc/jit/frontend/script_type_parser.cpp:461:41\n",
    "    #5 0x9e1c09b in torch::jit::SourceImporterImpl::importClass(c10::QualifiedName const&, torch::jit::ClassDef const&, bool) /pytorch_fuzz/torch/csrc/jit/serialization/import_source.cpp:549:34\n",
    "    #6 0x9e13f00 in torch::jit::SourceImporterImpl::importNamedType(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, torch::jit::ClassDef const&) /pytorch_fuzz/torch/csrc/jit/serialization/import_source.cpp:288:5\n",
    "    #7 0x9e11fbc in torch::jit::SourceImporterImpl::findNamedType(c10::QualifiedName const&) /pytorch_fuzz/torch/csrc/jit/serialization/import_source.cpp:140:5\n",
  1. Unhandled out of bounds access in a vector
    crash-ccd524e7ba19a37982dd91e0d6fc06bb26dd0b10.zip
  "AsanReport": [
    "==3792== ERROR: libFuzzer: deadly signal\n",
    "[Detaching after fork from child process 3809]\n",
    "    #0 0x59cc11 in __sanitizer_print_stack_trace /llvm-project/compiler-rt/lib/asan/asan_stack.cpp:87:3\n",
    "    #1 0x511547 in fuzzer::PrintStackTrace() /llvm-project/compiler-rt/lib/fuzzer/FuzzerUtil.cpp:210:5\n",
    "    #2 0x4f7753 in fuzzer::Fuzzer::CrashCallback() /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:233:3\n",
    "    #3 0x7ffff7c6741f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1441f)\n",
    "    #4 0x7ffff7a8700a in __libc_signal_restore_set /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/internal-signals.h:86:3\n",
    "    #5 0x7ffff7a8700a in raise /build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:48:3\n",
    "    #6 0x7ffff7a66858 in abort /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:79:7\n",
    "    #7 0x7ffff7e73910  (/lib/x86_64-linux-gnu/libstdc++.so.6+0x9e910)\n",
    "    #8 0x7ffff7e7f38b  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa38b)\n",
    "    #9 0x7ffff7e7f3f6 in std::terminate() (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa3f6)\n",
    "    #10 0x7ffff7e7f6a8 in __cxa_throw (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa6a8)\n",
    "    #11 0x7ffff7e763aa  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xa13aa)\n",
    "    #12 0x6aeedf in std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_range_check(unsigned long) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:1073:4\n",
    "    #13 0x9ecd66c in torch::jit::Unpickler::readInstruction() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp\n",
    "    #14 0x9ecafc7 in torch::jit::Unpickler::run() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:226:27\n",
    "    #15 0x9ecac62 in torch::jit::Unpickler::parse_ivalue() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:183:3\n",

Some other crashes found by fuzzer:
crash-0cab888cbd1e9fea92ab6ddeadf40b958b87d62b.zip
crash-04c9ba8e3b0f15028fd0fb0ed014fd352e182a1d.zip
crash-422ad8c3a3472980ba751f4c7f79cf2b53e49927.zip

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Aug 31, 2022
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 31, 2022

🔗 Helpful links

❌ 5 New Failures

As of commit f68c4c1292 (more details on the Dr. CI page):

Expand to see more
  • 5/5 failures introduced in this PR

🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge) (1/5)

Step: "Test" (full log | diagnosis details)

2022-08-31T12:57:16.1988425Z RuntimeError: test_proxy_tensor failed!
2022-08-31T12:57:12.7294612Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorFake-20220831124555.xml
2022-08-31T12:57:12.7315845Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorReal-20220831124555.xml
2022-08-31T12:57:12.7355732Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorSymbolic-20220831124555.xml
2022-08-31T12:57:12.8773436Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestProxyTensorOpInfoCPU-20220831124555.xml
2022-08-31T12:57:12.8781365Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestSymbolicTracing-20220831124555.xml
2022-08-31T12:57:16.1983819Z Traceback (most recent call last):
2022-08-31T12:57:16.1984157Z   File "test/run_test.py", line 1065, in <module>
2022-08-31T12:57:16.1986026Z     main()
2022-08-31T12:57:16.1986283Z   File "test/run_test.py", line 1043, in main
2022-08-31T12:57:16.1988165Z     raise RuntimeError(err_message)
2022-08-31T12:57:16.1988425Z RuntimeError: test_proxy_tensor failed!
2022-08-31T12:57:16.5270549Z 
2022-08-31T12:57:16.5271001Z real	23m21.960s
2022-08-31T12:57:16.5271210Z user	95m25.602s
2022-08-31T12:57:16.5271423Z sys	4m3.391s
2022-08-31T12:57:16.5309001Z ##[error]Process completed with exit code 1.
2022-08-31T12:57:16.5362105Z Prepare all required actions
2022-08-31T12:57:16.5362424Z Getting action download info
2022-08-31T12:57:16.7124723Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-31T12:57:16.7124944Z with:
2022-08-31T12:57:16.7125267Z   github-token: ***

See GitHub Actions build pull / linux-focal-py3.7-clang7-asan / test (default, 3, 5, linux.2xlarge) (2/5)

Step: "Test" (full log | diagnosis details)

2022-08-31T13:38:14.4218927Z RuntimeError: test_proxy_tensor failed!
2022-08-31T13:38:09.6689202Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorFake-20220831132352.xml
2022-08-31T13:38:09.6715724Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorReal-20220831132352.xml
2022-08-31T13:38:09.6743888Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorSymbolic-20220831132352.xml
2022-08-31T13:38:10.8547529Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestProxyTensorOpInfoCPU-20220831132352.xml
2022-08-31T13:38:10.8556757Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestSymbolicTracing-20220831132352.xml
2022-08-31T13:38:14.4209799Z Traceback (most recent call last):
2022-08-31T13:38:14.4210275Z   File "test/run_test.py", line 1065, in <module>
2022-08-31T13:38:14.4213665Z     main()
2022-08-31T13:38:14.4214007Z   File "test/run_test.py", line 1043, in main
2022-08-31T13:38:14.4218563Z     raise RuntimeError(err_message)
2022-08-31T13:38:14.4218927Z RuntimeError: test_proxy_tensor failed!
2022-08-31T13:38:15.0433552Z 
2022-08-31T13:38:15.0433949Z real	61m14.902s
2022-08-31T13:38:15.0434166Z user	60m55.148s
2022-08-31T13:38:15.0434327Z sys	0m15.188s
2022-08-31T13:38:15.0475793Z ##[error]Process completed with exit code 1.
2022-08-31T13:38:15.0545187Z Prepare all required actions
2022-08-31T13:38:15.0545492Z Getting action download info
2022-08-31T13:38:15.2181135Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-31T13:38:15.2181356Z with:
2022-08-31T13:38:15.2181694Z   github-token: ***

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (dynamo, 2, 2, linux.2xlarge) (3/5)

Step: "Test" (full log | diagnosis details)

2022-08-31T13:20:22.9256271Z AssertionError: to....* op returned non-Tensor int call_method data_ptr
2022-08-31T13:20:22.9175236Z     self.assertRaises(IndexError, lambda: reference[0.0, :, 0.0])
2022-08-31T13:20:22.9175526Z 
2022-08-31T13:20:22.9175764Z Set torchdynamo.config.verbose=True for more information
2022-08-31T13:20:22.9176170Z ==========
2022-08-31T13:20:22.9188651Z ok (2.039s)
2022-08-31T13:20:22.9253350Z   test_index_getitem_copy_bools_slices_cpu (__main__.TestIndexingCPU) ... torchdynamo.convert_frame: [ERROR] WON'T CONVERT test_index_getitem_copy_bools_slices test_indexing.py line 1003 
2022-08-31T13:20:22.9254006Z due to: 
2022-08-31T13:20:22.9254308Z Traceback (most recent call last):
2022-08-31T13:20:22.9254769Z   File "/var/lib/jenkins/torchdynamo/torchdynamo/variables/tensor.py", line 283, in create
2022-08-31T13:20:22.9255562Z     ), f"torch.* op returned non-Tensor {typestr(example_value)} {proxy.node.op} {proxy.node.target}"
2022-08-31T13:20:22.9256271Z AssertionError: torch.* op returned non-Tensor int call_method data_ptr
2022-08-31T13:20:22.9256545Z 
2022-08-31T13:20:22.9256656Z from user code:
2022-08-31T13:20:22.9257033Z    File "test_indexing.py", line 1010, in test_index_getitem_copy_bools_slices
2022-08-31T13:20:22.9257514Z     self.assertNotEqual(a.data_ptr(), a[True].data_ptr())
2022-08-31T13:20:22.9257799Z 
2022-08-31T13:20:22.9258042Z Set torchdynamo.config.verbose=True for more information
2022-08-31T13:20:22.9258407Z ==========
2022-08-31T13:20:22.9269111Z ok (0.007s)
2022-08-31T13:20:23.0660060Z   test_index_put_accumulate_duplicate_indices_cpu (__main__.TestIndexingCPU) ... ok (0.138s)
2022-08-31T13:20:23.1467373Z   test_index_put_accumulate_expanded_values_cpu (__main__.TestIndexingCPU) ... ok (0.080s)

See GitHub Actions build pull / linux-focal-py3.7-gcc7 / test (default, 2, 2, linux.2xlarge) (4/5)

Step: "Test" (full log | diagnosis details)

2022-08-31T12:56:33.1038509Z RuntimeError: test_proxy_tensor failed!
2022-08-31T12:56:29.8068405Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorFake-20220831124504.xml
2022-08-31T12:56:29.8088951Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorReal-20220831124504.xml
2022-08-31T12:56:29.8111243Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorSymbolic-20220831124504.xml
2022-08-31T12:56:29.9503418Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestProxyTensorOpInfoCPU-20220831124504.xml
2022-08-31T12:56:29.9510745Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestSymbolicTracing-20220831124504.xml
2022-08-31T12:56:33.1033336Z Traceback (most recent call last):
2022-08-31T12:56:33.1033652Z   File "test/run_test.py", line 1065, in <module>
2022-08-31T12:56:33.1035633Z     main()
2022-08-31T12:56:33.1035854Z   File "test/run_test.py", line 1043, in main
2022-08-31T12:56:33.1038116Z     raise RuntimeError(err_message)
2022-08-31T12:56:33.1038509Z RuntimeError: test_proxy_tensor failed!
2022-08-31T12:56:33.4056855Z 
2022-08-31T12:56:33.4057140Z real	23m15.148s
2022-08-31T12:56:33.4057436Z user	87m48.993s
2022-08-31T12:56:33.4057698Z sys	0m33.972s
2022-08-31T12:56:33.4093994Z ##[error]Process completed with exit code 1.
2022-08-31T12:56:33.4134178Z Prepare all required actions
2022-08-31T12:56:33.4134482Z Getting action download info
2022-08-31T12:56:33.6673180Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-31T12:56:33.6673406Z with:
2022-08-31T12:56:33.6673718Z   github-token: ***

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (crossref, 1, 2, linux.2xlarge) (5/5)

Step: "Test" (full log | diagnosis details)

2022-08-31T13:18:54.4214843Z RuntimeError: test_proxy_tensor failed!
2022-08-31T13:18:51.0912528Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorFake-20220831130714.xml
2022-08-31T13:18:51.0933918Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorReal-20220831130714.xml
2022-08-31T13:18:51.0956219Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestGenericProxyTensorSymbolic-20220831130714.xml
2022-08-31T13:18:51.2361313Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestProxyTensorOpInfoCPU-20220831130714.xml
2022-08-31T13:18:51.2368647Z Generated XML report: test-reports/python-unittest/test_proxy_tensor/TEST-TestSymbolicTracing-20220831130714.xml
2022-08-31T13:18:54.4206945Z Traceback (most recent call last):
2022-08-31T13:18:54.4207354Z   File "test/run_test.py", line 1065, in <module>
2022-08-31T13:18:54.4213705Z     main()
2022-08-31T13:18:54.4214117Z   File "test/run_test.py", line 1043, in main
2022-08-31T13:18:54.4214538Z     raise RuntimeError(err_message)
2022-08-31T13:18:54.4214843Z RuntimeError: test_proxy_tensor failed!
2022-08-31T13:18:54.7218537Z 
2022-08-31T13:18:54.7219050Z real	44m59.163s
2022-08-31T13:18:54.7219269Z user	181m4.802s
2022-08-31T13:18:54.7219457Z sys	14m21.597s
2022-08-31T13:18:54.7257634Z ##[error]Process completed with exit code 1.
2022-08-31T13:18:54.7307634Z Prepare all required actions
2022-08-31T13:18:54.7307920Z Getting action download info
2022-08-31T13:18:54.9180209Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-31T13:18:54.9180436Z with:
2022-08-31T13:18:54.9180763Z   github-token: ***

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Aug 31, 2022
@soulitzer soulitzer requested a review from eellison September 1, 2022 19:46
@soulitzer soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 1, 2022
@apach301 apach301 force-pushed the add-security-checks branch from f68c4c1 to 86ea4ce Compare September 9, 2022 12:06
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 9, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/84343

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1916e84:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: apach301 / name: Daniel Kuts (86ea4cebec71f2b1e55d7b5897fb8c50f2c609c1)

@kit1980 kit1980 requested a review from jerryzh168 October 25, 2022 00:20
@kit1980
Copy link
Contributor

kit1980 commented Oct 25, 2022

@jerryzh168 Please take a look if this is something we want to do.

@jerryzh168
Copy link
Contributor

@jerryzh168 Please take a look if this is something we want to do.

I'm not very familiar with this, cc @gmagogsfm can you take a look?

@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@apach301 apach301 force-pushed the add-security-checks branch from 3a115fa to 1916e84 Compare January 17, 2023 13:06
@apach301
Copy link
Contributor Author

@gmagogsfm @kit1980 @eellison
Could you please review changes? I rebased branch on actual master. This PR contains some security checks that prevents segmentation fault failures by throwing exceptions with user-friendly description. Also I can't remove Stale label.

@apach301 apach301 closed this Feb 14, 2023
@apach301 apach301 deleted the add-security-checks branch February 21, 2023 15:31
pytorchmergebot pushed a commit that referenced this pull request May 2, 2023
…94300)

Hi!

I've been fuzzing different pytorch modules, and found a crash inside one of them.

Specifically, I'm talking about a module for unpickling and a function called `Unpickler::readInstruction()`. Running this function with provided crash file results in a crash, which occurs while calling `auto dict = stack_.at(dict_pos).toGenericDict();` [unpickler.cpp:561](https://github.com/pytorch/pytorch/blob/0e94fbc0c8ab1572c88159c1a4c397b6eb824c01/torch/csrc/jit/serialization/unpickler.cpp#L561). The crash occurs, because the index `dict_pos` is out of bounds (which itself happens because the stack size is 0).

Besides this pull-request, there is another one related to unpickler hardening: #84343

All tests were performed on this pytorch version: [abc54f9](https://github.com/pytorch/pytorch/tree/abc54f93145830b502400faa92bec86e05422fbd)

### How to reproduce

1. To reproduce the crash, use provided docker: [Dockerfile](https://github.com/ispras/oss-sydr-fuzz/tree/master/projects/pytorch)

2. Build the container: `docker build -t oss-sydr-fuzz-pytorch-reproduce .`

3. Copy crash file to the current directory:

    - [crash-042dff5e121580425d9d34d0f293918f3c9fbf1e.zip](https://github.com/pytorch/pytorch/files/10674361/crash-042dff5e121580425d9d34d0f293918f3c9fbf1e.zip)

4. Run the container: ``docker run --privileged --network host -v `pwd`:/homedir --rm -it oss-sydr-fuzz-pytorch-reproduce /bin/bash``

5. And execute the binary: `/message_deserialize_sydr /homedir/crash-042dff5e121580425d9d34d0f293918f3c9fbf1e`

After execution completes you will see this error message:

```txt
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551613) >= this->size() (which is 0)
```

And this stacktrace:

```asan
erminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 18446744073709551613) >= this->size() (which is 0)
==39== ERROR: libFuzzer: deadly signal
    #0 0x5d0df1 in __sanitizer_print_stack_trace /llvm-project/compiler-rt/lib/asan/asan_stack.cpp:87:3
    #1 0x545727 in fuzzer::PrintStackTrace() /llvm-project/compiler-rt/lib/fuzzer/FuzzerUtil.cpp:210:5
    #2 0x52b933 in fuzzer::Fuzzer::CrashCallback() /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:233:3
    #3 0x7f9118e0341f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1441f)
    #4 0x7f9118c2300a in raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300a)
    #5 0x7f9118c02858 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22858)
    #6 0x7f9119040910  (/lib/x86_64-linux-gnu/libstdc++.so.6+0x9e910)
    #7 0x7f911904c38b  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa38b)
    #8 0x7f911904c3f6 in std::terminate() (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa3f6)
    #9 0x7f911904c6a8 in __cxa_throw (/lib/x86_64-linux-gnu/libstdc++.so.6+0xaa6a8)
    #10 0x7f91190433aa  (/lib/x86_64-linux-gnu/libstdc++.so.6+0xa13aa)
    #11 0x63acdf in std::vector<c10::IValue, std::allocator<c10::IValue> >::_M_range_check(unsigned long) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:1073:4
    #12 0xce8f93e in std::vector<c10::IValue, std::allocator<c10::IValue> >::at(unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_vector.h:1094:2
    #13 0xce8f93e in torch::jit::Unpickler::readInstruction() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:546:26
    #14 0xce8d527 in torch::jit::Unpickler::run() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:235:27
    #15 0xce8d1c2 in torch::jit::Unpickler::parse_ivalue() /pytorch_fuzz/torch/csrc/jit/serialization/unpickler.cpp:192:3
    #16 0xcdf0792 in torch::jit::unpickle(std::function<unsigned long (char*, unsigned long)>, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch_fuzz/torch/csrc/jit/serialization/pickle.cpp:127:20
    #17 0xcdf104d in torch::jit::unpickle(char const*, unsigned long, std::function<c10::StrongTypePtr (c10::QualifiedName const&)>, c10::ArrayRef<at::Tensor>, c10::Type::SingletonOrSharedTypePtr<c10::Type> (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)) /pytorch_fuzz/torch/csrc/jit/serialization/pickle.cpp:137:10
    #18 0xe0532db in torch::distributed::rpc::ScriptRemoteCall::fromMessage(torch::distributed::rpc::Message const&) /pytorch_fuzz/torch/csrc/distributed/rpc/script_remote_call.cpp:74:16
    #19 0xe0ffa10 in torch::distributed::rpc::deserializeRequest(torch::distributed::rpc::Message const&) /pytorch_fuzz/torch/csrc/distributed/rpc/utils.cpp:108:14
    #20 0x602a41 in LLVMFuzzerTestOneInput /message_deserialize_fuzz.cc:192:27
    #21 0x52ce61 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #22 0x516d7c in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #23 0x51cacb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:860:9
    #24 0x546062 in main /llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #25 0x7f9118c04082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #26 0x51169d in _start (/message_deserialize_fuzz+0x51169d)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
```
Pull Request resolved: #94300
Approved by: https://github.com/malfet, https://github.com/apach301
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source release notes: jit release notes category Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants