Environment
- OS: Ubuntu 22.04
- Compiler: clang 13.0.1
- Sanitizers: AddressSanitizer (ASan) + UndefinedBehaviorSanitizer (UBSan)
Build Instructions
export CC=clang-13
export CXX=clang++-13
export CXXFLAGS="${CXXFLAGS} -std=c++17 -stdlib=libstdc++ -fsanitize=address -O1 -g"
export CFLAGS="${CFLAGS} -fsanitize=address -O1 -g"
export LDFLAGS="${LDFLAGS} -fsanitize=address"
export LIB_FUZZING_ENGINE="-fsanitize=fuzzer"
sed -i 's/CMAKE_CXX_STANDARD 11/CMAKE_CXX_STANDARD 17/g' CMakeLists.txt
sed -i 's/std::random/\/\/std::random/g' test/*.cpp
mkdir build && cd build
cmake .. -DBUILD_SHARED=OFF -DBUILD_MIXED=ON
make -j $(nproc)
Reproduction
Run the fuzzer with a crafted input file:
Observed Behavior
Program crashes with ASan/UBSan report:
==1098880==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000013f00
READ of size 2 at 0x603000013f00 thread T0
#0 0x5257b6 in StrtolFixAndCheck ...
#1 0x55d200 in __interceptor_strtol
#2 0x713674 in atoi /usr/include/stdlib.h:364:16
#3 0x713674 in OpenBabel::GAMESSOutputFormat::ReadMolecule .../gamessformat.cpp:262:16
#4 0xea2211 in OpenBabel::OBMoleculeFormat::ReadChemObjectImpl .../obmolecformat.cpp:101:18
...
0x603000013f00 is located 0 bytes inside of 24-byte region ... (freed here)
#12 0xe8975c in OpenBabel::tokenize(...) .../tokenst.cpp:39:9
previously allocated here
#6 0xe89b9a in OpenBabel::tokenize(...) .../tokenst.cpp:53:27
...
Root Cause Analysis
GAMESSOutputFormat::ReadMolecule tokenizes a line (OpenBabel::tokenize) then converts one token to an integer via atoi. The token vector/string is cleared/freed (via tokenize’s internal operations) while a pointer/reference to its buffer is still used by atoi/strtol, resulting in a use-after-free.
The attached file contains a proof-of-concept.
poc.zip
Environment
Build Instructions
Reproduction
Run the fuzzer with a crafted input file:
Observed Behavior
Program crashes with ASan/UBSan report:
Root Cause Analysis
GAMESSOutputFormat::ReadMolecule tokenizes a line (OpenBabel::tokenize) then converts one token to an integer via atoi. The token vector/string is cleared/freed (via tokenize’s internal operations) while a pointer/reference to its buffer is still used by atoi/strtol, resulting in a use-after-free.
The attached file contains a proof-of-concept.
poc.zip