Bug
(originally reported by @jackluo923 )
The clp::streaming_compression::zstd::Decompressor incorrectly returns EOF errors when the input buffer is fully consumed, even when the zstd decompression context still contains buffered data that can be flushed.
According to the zstd manual, when output.pos == output.size (output buffer is full), there may still be data remaining in the decompression context. The manual states that developers should "call ZSTD_decompressStream() again to flush whatever remains in the buffer."
In the next call of bytes reading from the decompressor, our current implementation only checks if more bytes can be consumed from the input buffer and returns EOF if not, completely ignoring that calling ZSTD_decompressStream() could yield additional data from the internal buffers.
CLP version
c69d4cf
Environment
the issue was original found from clp-ffi-js which is compiled by Emscripten 3.0.67 (derived from LLVM)
this env was used to reproduce with below steps:
g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
Reproduction steps
-
Download and extract the test file compressed-logs.clp.zst from compressed-logs.clp.zst.zip
-
Add this test case to components/core/tests/test-StreamingCompression.cpp:
SECTION("ZStd compression without flush") {
decompressor = std::make_unique<clp::streaming_compression::zstd::Decompressor>();
clp::ReadOnlyMemoryMappedFile const memory_mapped_compressed_file{
"/home/junhao/Downloads/compressed-logs.clp.zst"
};
auto const compressed_file_view{memory_mapped_compressed_file.get_view()};
decompressor->open(compressed_file_view.data(), compressed_file_view.size());
std::ranges::fill(decompressed_buffer.begin(), decompressed_buffer.end(), 0);
// First read should succeed
auto read_result = decompressor->try_read_exact_length(decompressed_buffer.data(), 4);
REQUIRE(read_result == ErrorCode_Success);
// Second read should also succeed but currently fails with EOF
read_result = decompressor->try_read_exact_length(decompressed_buffer.data(), 1);
REQUIRE(read_result == ErrorCode_Success); // This assertion fails
decompressor->close();
}
-
Run the test - the second try_read_exact_length call will fail with EOF
-
However, using zstdcat confirms the file contains approximately 260 uncompressed bytes, which is significantly more than the 5 bytes (4 + 1) requested in the test:
$ zstdcat compressed-logs.clp.zst | wc -c
compressed-logs.clp.zst : Read error (39) : premature end
260
The second try_read_exact_length call should succeed and read the requested number of bytes, not return an EOF error.
Bug
(originally reported by @jackluo923 )
The
clp::streaming_compression::zstd::Decompressorincorrectly returns EOF errors when the input buffer is fully consumed, even when the zstd decompression context still contains buffered data that can be flushed.According to the zstd manual, when
output.pos == output.size(output buffer is full), there may still be data remaining in the decompression context. The manual states that developers should "callZSTD_decompressStream()again to flush whatever remains in the buffer."In the next call of bytes reading from the decompressor, our current implementation only checks if more bytes can be consumed from the input buffer and returns EOF if not, completely ignoring that calling
ZSTD_decompressStream()could yield additional data from the internal buffers.CLP version
c69d4cf
Environment
the issue was original found from clp-ffi-js which is compiled by Emscripten 3.0.67 (derived from LLVM)
this env was used to reproduce with below steps:
Reproduction steps
Download and extract the test file
compressed-logs.clp.zstfrom compressed-logs.clp.zst.zipAdd this test case to
components/core/tests/test-StreamingCompression.cpp:Run the test - the second
try_read_exact_lengthcall will fail with EOFHowever, using
zstdcatconfirms the file contains approximately 260 uncompressed bytes, which is significantly more than the 5 bytes (4 + 1) requested in the test:The second
try_read_exact_lengthcall should succeed and read the requested number of bytes, not return an EOF error.