Skip to content

clp::streaming_compression::zstd::Decompressor prematurely returns EOF when buffered data remains #976

@junhaoliao

Description

@junhaoliao

Bug

(originally reported by @jackluo923 )

The clp::streaming_compression::zstd::Decompressor incorrectly returns EOF errors when the input buffer is fully consumed, even when the zstd decompression context still contains buffered data that can be flushed.

According to the zstd manual, when output.pos == output.size (output buffer is full), there may still be data remaining in the decompression context. The manual states that developers should "call ZSTD_decompressStream() again to flush whatever remains in the buffer."

In the next call of bytes reading from the decompressor, our current implementation only checks if more bytes can be consumed from the input buffer and returns EOF if not, completely ignoring that calling ZSTD_decompressStream() could yield additional data from the internal buffers.

CLP version

c69d4cf

Environment

the issue was original found from clp-ffi-js which is compiled by Emscripten 3.0.67 (derived from LLVM)

this env was used to reproduce with below steps:

g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 

Reproduction steps

  1. Download and extract the test file compressed-logs.clp.zst from compressed-logs.clp.zst.zip

  2. Add this test case to components/core/tests/test-StreamingCompression.cpp:

    SECTION("ZStd compression without flush") {
        decompressor = std::make_unique<clp::streaming_compression::zstd::Decompressor>();
        clp::ReadOnlyMemoryMappedFile const memory_mapped_compressed_file{
            "/home/junhao/Downloads/compressed-logs.clp.zst"
        };
        auto const compressed_file_view{memory_mapped_compressed_file.get_view()};
        decompressor->open(compressed_file_view.data(), compressed_file_view.size());
    
        std::ranges::fill(decompressed_buffer.begin(), decompressed_buffer.end(), 0);
        
        // First read should succeed
        auto read_result = decompressor->try_read_exact_length(decompressed_buffer.data(), 4);
        REQUIRE(read_result == ErrorCode_Success);
        
        // Second read should also succeed but currently fails with EOF
        read_result = decompressor->try_read_exact_length(decompressed_buffer.data(), 1);
        REQUIRE(read_result == ErrorCode_Success);  // This assertion fails
    
        decompressor->close();
    }
  3. Run the test - the second try_read_exact_length call will fail with EOF

  4. However, using zstdcat confirms the file contains approximately 260 uncompressed bytes, which is significantly more than the 5 bytes (4 + 1) requested in the test:

    $ zstdcat compressed-logs.clp.zst | wc -c
    compressed-logs.clp.zst : Read error (39) : premature end 
       260

    The second try_read_exact_length call should succeed and read the requested number of bytes, not return an EOF error.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions