GNR v2, ARM64 Optimizations & Hardened Safety

Version 0.4.0 is bringing major architectural improvements that significantly enhance both compression ratio and decompression speed.

Changes

Optimizations & Core Logic

Variable Offset Encoding: Implemented a dynamic encoding scheme using 1-byte offsets for short distances and 2-byte offsets for larger ones, resulting in a more compact bitstream.
VByte Decoding: Added multiple optimization paths to accelerate integer decoding: Branchless SWAR (SIMD Within A Register) for generic targets.
LZ Match Encoding: Implemented filtering for long-distance matches to prevent inefficient encoding of short matches with large offsets.
Double Lazy Matching: Added "Double Lazy" strategy for higher compression levels (L4+), finding deeper and better matches.
RLE Optimization: Enhanced both the analysis and encoding/decoding phases for Run-Length Encoding.

File Header Format

Before (early versions): Bytes 5-7 are reserved (always 0)
After (v0.4.0): Byte 5 = Chunk Size Code (chunk_size / 4096)
- 0 = Legacy 256KB (backward compatible)
- 64 = 256KB (default)
Impact: Early decoders ignore this field but v0.4.0 decoders require it for proper block allocation

Performance Improvements

Throughput Maximization: Delivered global decompression speedups for all levels.
Compression Ratio: Optimized compression ratio to shave sizes for all levels.
Trade-Off: From level 3 onwards, compression speed decreases to improve the compression ratio. But for levels 1 and 2, the compression speed has been improved.

Download Guide

Build Selection

CPU Generation	Linux	Windows	macOS
x86-64 (2006+)	`zxc-linux-x86_64.tar.gz`	`zxc-windows-x64.exe.zip`	-
AVX2 (2013+, Haswell)	`zxc-linux-x86_64-avx2.tar.gz`	`zxc-windows-x64-avx2.exe.zip`	-
AVX512 (2017+, Skylake-X)	`zxc-linux-x86_64-avx512.tar.gz`	`zxc-windows-x64-avx512.exe.zip`	-
ARM64	`zxc-linux-aarch64.tar.gz`	-	`zxc-macos-arm64.tar.gz`

Unsure? Use the generic x86-64 build for universal compatibility.

CPU Feature Detection (x86)

# Linux
grep -E 'avx512|avx2' /proc/cpuinfo | head -1

# Windows (PowerShell)
Get-WmiObject Win32_Processor | Select-Object Name

Performance

x86-64: Baseline (SSE2)
AVX2: ~20-30% faster than baseline
AVX512: ~40-60% faster than baseline (tested with Intel SDE)

Build from Source

For optimal CPU-specific performance:

git clone https://github.com/hellobertrand/zxc.git
cd zxc && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DZXC_NATIVE_ARCH=ON ..
cmake --build . --parallel

Enables -march=native for maximum SIMD utilization.

Full Changelog: v0.3.3...v0.4.0

@tansy

This release focuses on CLI stability and I/O performance optimizations, resolving issues with archive consistency across different output targets.

Changes

Fixed stdout consistency: Resolved a memory management issue in the CLI where custom buffers were freed before stream closure, causing size discrepancies when redirecting output to stdout (Issue #36).
Buffer management: Improved setvbuf handling to ensure safe teardown and flushing of I/O streams across all platforms (Linux, Windows, macOS).
Increments patch version to 0.3.3 by @tansy
Fixes buffering issues with stdin and stdout by @tansy & @hellobertrand in #37

Acknowledgments

A special thank you to @tansy for identifying the archive size discrepancy and providing the comprehensive test case required for reproduction and verification.

Download Guide

Build Selection

CPU Generation	Linux	Windows	macOS
x86-64 (2006+)	`zxc-linux-x86_64.tar.gz`	`zxc-windows-x64.exe.zip`	-
AVX2 (2013+, Haswell)	`zxc-linux-x86_64-avx2.tar.gz`	`zxc-windows-x64-avx2.exe.zip`	-
AVX512 (2017+, Skylake-X)	`zxc-linux-x86_64-avx512.tar.gz`	`zxc-windows-x64-avx512.exe.zip`	-
ARM64	`zxc-linux-aarch64.tar.gz`	-	`zxc-macos-arm64.tar.gz`

Unsure? Use the generic x86-64 build for universal compatibility.

CPU Feature Detection (x86)

# Linux
grep -E 'avx512|avx2' /proc/cpuinfo | head -1

# Windows (PowerShell)
Get-WmiObject Win32_Processor | Select-Object Name

Performance

x86-64: Baseline (SSE2)
AVX2: ~20-30% faster than baseline
AVX512: ~40-60% faster than baseline (tested with Intel SDE)

Build from Source

For optimal CPU-specific performance:

git clone https://github.com/hellobertrand/zxc.git
cd zxc && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DZXC_NATIVE_ARCH=ON ..
cmake --build . --parallel

Enables -march=native for maximum SIMD utilization.

Full Changelog: v0.3.2...v0.3.3

Improvements

Better file handling (#28)

Improved error handling for edge cases
More robust input validation

Performance optimizations (#29)

Replaced logical NOT with bitwise XOR in hot path
Branchless evaluation for better CPU pipelining

CI/CD

CodeQL security analysis (#26, #27)

Automated vulnerability scanning on every PR
Custom configuration for C/C++ security patterns

Build system modernization (#24)

Updated CMake configuration
Improved CI/CD workflows

Add multi-arch builds (#30)

Adds native architecture flag for CMake builds
Adds multi-architecture build workflow
Adds cross-strip tool support to CMake builds
Add workflows README

Dependabot Github Actions updates

Bump github/codeql-action from 3 to 4 (#34)
Bump actions/cache from 4 to 5 (#33)
Bump actions/download-artifact from 4 to 7 (#35)
Bump actions/upload-artifact from 4 to 6 (#32)
Bump actions/checkout from 4 to 6 (#31)

Download Guide

Build Selection

CPU Generation	Linux	Windows	macOS
x86-64 (2006+)	`zxc-linux-x86_64.tar.gz`	`zxc-windows-x64.exe.zip`	-
AVX2 (2013+, Haswell)	`zxc-linux-x86_64-avx2.tar.gz`	`zxc-windows-x64-avx2.exe.zip`	-
AVX512 (2017+, Skylake-X)	`zxc-linux-x86_64-avx512.tar.gz`	`zxc-windows-x64-avx512.exe.zip`	-
ARM64	`zxc-linux-aarch64.tar.gz`	-	`zxc-macos-arm64.tar.gz`

Unsure? Use the generic x86-64 build for universal compatibility.

CPU Feature Detection (x86)

# Linux
grep -E 'avx512|avx2' /proc/cpuinfo | head -1

# Windows (PowerShell)
Get-WmiObject Win32_Processor | Select-Object Name

Performance

x86-64: Baseline (SSE2)
AVX2: ~20-30% faster than baseline
AVX512: ~40-60% faster than baseline (tested with Intel SDE)

Build from Source

For optimal CPU-specific performance:

git clone https://github.com/hellobertrand/zxc.git
cd zxc && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DZXC_NATIVE_ARCH=ON ..
cmake --build . --parallel

Enables `-march=native` for maximum SIMD utilization.

Full Changelog: v0.3.1...v0.3.2

@hellobertrand

Security Hardening

Fix flushes stdout buffer before exiting by @hellobertrand in #22.
Thanks to @xcfmc for reporting the issue and for his help.

Documentation

Add community bindings to readme by @meysam81 in #21
Update WHITEPAPER.md documentation

New Contributors

@meysam81 made their first contribution in #21
Thanks to @meysam81 for creating a Go wrapper for ZXC. You can now easily integrate ZXC into your Go projects.
go-zxc: https://github.com/meysam81/go-zxc

Full Changelog: v0.3.0...v0.3.1

@tansy

GNR v2, ARM64 Optimizations & Hardened Safety

This release introduces the v2 generation of the internal GNR (General Block) decoder, bringing performance improvements through branchless logic and SIMD vectorization. It also includes a comprehensive security hardening pass, adding rigorous bounds checking and validation to all decoding paths.

Highlights

GNR v2 Decoder Engine

The core decoding loop has been rewritten to maximize instruction-level parallelism:

Branchless Design: Implemented branchless wild copies and match checks to minimize pipeline flushes.
SIMD Acceleration: Added native NEON (ARM64) and AVX2/SSSE3 (x86) implementations for overlapping copy routines.
Hybrid Decoding Strategy: Decompression uses a two-phase approach: careful bounds checking for the first 64KB, then an optimized unchecked path once all possible 16-bit offsets are mathematically guaranteed to be valid. This removes a branch per sequence for 75% of each chunk.

Fuzzing-Driven Safety Hardening

Following extensive fuzzing, multiple layers of protection have been added to prevent malformed streams from causing crashes:

Offset & Size Validation: Added rigorous checks for out-of-bounds reads during variable byte decoding and numeric referencing.
Overflow Protection: Implemented detection for integer overflows in VByte reads and destination buffer writes.
Infinite Loop Prevention: Added size limits to variable byte decoding sequences.

Encoder Optimizations

Fibonacci Hashing: Switched to a faster Fibonacci hash function for better distribution and speed.
Speculative Prefetching: Added memory prefetching for hash table entries to reduce cache miss latency.
Branchless Match Finder: Refactored the encoder's match checker to use bitwise masks instead of conditional branches.

Special Thanks

Thanks to @tansy for rewriting the CLI client, including the addition of short options and standardizing compression level flags (-1 to -5).

Add short options to help, version by @tansy in #15
Use -1..-5 as compression levels by @tansy in #16

Performance

Level -1: Add new compression level -1
GNR v2: Replaced legacy decoder with v2; utilizes single-sequence processing and streamlined loops.
SIMD: Added 32-byte copy routines and NEON shuffle optimizations for small offsets (2-15 bytes).
Prefetching: Implemented speculative prefetching for hash chain entries.
Hashing: Replaced masking with ZXC_LZ_HASH_BITS derived shifts (Fibonacci variant).
Memory: Used RESTRICT keyword on critical hot paths to aid compiler optimization.

Safety & Integrity

Validation: Added destination bounds checks before writing literals in generic number decoding.
VByte: Added strict bounds checking and overflow detection to variable byte decoding.
Stream: Validated stream sizes against sequence counts for early error detection.
Sanitization: Fixed potential out-of-bounds reads in the fast path by falling back to the safe path when remaining data is small.

Internals & Refactoring

Cleanup: Removed dead code for the original v1 GNR decoder.
Formatting: Renamed variables for clarity and updated internal documentation.
CI: Made fuzzer build scripts dynamic and updated benchmark workflows.

Full Changelog: v0.2.0...v0.3.0

@tzcnt

This release brings significant security hardening, performance optimizations, and a major structural refactor of the public API.

Special Thanks

A huge shoutout to @tzcnt for their first public contribution! He spearheaded the restructuring of the public headers to provide a cleaner "sans-IO" API (#9). This makes integrating zxc into projects that manage their own I/O significantly easier. Thank you for your contribution!

Security Hardening

This release includes comprehensive security improvements ensuring robustness against malformed or malicious inputs:

Decompression Bounds Checking: Implemented strict bounds checking in the decompression fast paths to prevent input buffer over-reads and invalid offset access.
VByte Hardening: Hardened variable-byte integer reading logic to prevent buffer overruns and potential infinite loops with malformed data.
Memory Safety: Fixed a MemorySanitizer (MSan) warning by explicitly zero-initializing memory blocks in the stream engine, ensuring no uninitialized values leak into the output.

Performance Improvements

Reduced Thread Contention: Optimized the stream engine to reduce lock contention, improving scalability on high-core-count systems.
Short-Circuit Optimization: Optimized decompression safety checks to short-circuit expensive offset validation for valid large blocks (>64KB), recovering performance while maintaining safety.
Memory Usage: Reduced memory footprint of the chain table.
Buffer Management: Refactored buffer allocation strategies for better I/O performance.

API & Refactoring

Sans-IO API: Public headers have been restructured to separate core compression logic from file I/O utilities.
Bug Fixes: Various fixes for edge cases in raw block handling and fuzzing tests.

Full Changelog

Restructure public headers to provide a "sans-IO" API (#9) (tzcnt)
Initializes memory block after allocation (Fix MSan uninitialized bytes)
Adds comprehensive checks to prevent buffer overflows in decompression
Optimize hot path logic for decompression
Raises capacity checks to avoid buffer overflows
Fixes fuzzers names and updates fuzzing schedule
Reduces memory usage of chain table
Reduces thread contention in stream engine
Updates atomic type definitions and I/O error handling
Format code and cleanup unused docs

Full Changelog: v0.1.2...v0.2.0

Fixes & Reliability

Safety: Added input validation to prevent potential crashes. (6bdde9b)
Safety: Fixed a potential null pointer dereference issue. (df0b7f3)
Cleanup: Removed a redundant file pointer check to clean up the code. (64f7088)

Documentation

Docs: Added missing parameter documentation affecting header files. (70291b8)
Testing

Tests

Added additional unit tests to improve coverage. (628eb35)

Full Changelog: v0.1.1...v0.1.2

@hellobertrand

What's Changed

Improves fuzzing and benchmarking by @hellobertrand in #7

Full Changelog: v0.1.0...v0.1.1

Releases: hellobertrand/zxc

ZXC v0.4.0

Changes

Optimizations & Core Logic

File Header Format

Performance Improvements

Download Guide

Build Selection

CPU Feature Detection (x86)

Performance

Build from Source

Uh oh!

ZXC v0.3.3

Changes

Acknowledgments

Download Guide

Build Selection

CPU Feature Detection (x86)

Performance

Build from Source

Contributors

Uh oh!

ZXC v0.3.2

Improvements

CI/CD

Download Guide

Build Selection

CPU Feature Detection (x86)

Performance

Build from Source

Enables -march=native for maximum SIMD utilization.

Uh oh!

ZXC v0.3.1

Security Hardening

Documentation

New Contributors

Contributors

Uh oh!

ZXC v0.3.0

GNR v2, ARM64 Optimizations & Hardened Safety

Highlights

GNR v2 Decoder Engine

Fuzzing-Driven Safety Hardening

Encoder Optimizations

Special Thanks

Performance

Safety & Integrity

Internals & Refactoring

Contributors

Uh oh!

ZXC v0.2.0

Special Thanks

Security Hardening

Performance Improvements

API & Refactoring

Full Changelog

Contributors

Uh oh!

ZXC v0.1.2

Fixes & Reliability

Documentation

Tests

Uh oh!

ZXC v0.1.1

What's Changed

Contributors

Uh oh!

ZXC v0.1.0 - Hello World

Uh oh!

Enables `-march=native` for maximum SIMD utilization.