Pure Rust implementation of archive and compression formats with core algorithms implemented from scratch.
OxiArc is a comprehensive archive/compression library and CLI tool written in pure Rust. It provides support for multiple archive formats and compression algorithms, all implemented without relying on C bindings or external compression libraries. Built from the ground up with performance and safety in mind.
- ZIP - PKZIP format with DEFLATE and Store methods, Zip64 support
- TAR - POSIX tar with UStar and PAX extended headers
- GZIP - GNU zip single-file compression (RFC 1952)
- LZH/LHA - Japanese archive format with lh0-lh7 methods
- XZ - Modern LZMA2 compression format
- 7z - 7-Zip archive format (read-only)
- CAB - Microsoft Cabinet format (read-only)
- LZ4 - Fast LZ4 frame format
- Zstandard - Facebook's fast compression format
- Bzip2 - Block-sorting compression
- Brotli - Brotli compression (RFC 7932)
- Snappy - Google's fast compression format
- DEFLATE (RFC 1951) - LZ77 + Huffman, levels 0-9, async deflate support
- LZMA/LZMA2 - Range coding with context modeling
- LZH - LZSS + Huffman (lh0, lh4-lh7)
- Bzip2 - BWT + MTF + RLE + Huffman
- LZ4 - Ultra-fast LZ77 variant with LZ4-HC
- Zstandard - FSE + Huffman entropy coding
- LZW - Lempel-Ziv-Welch for TIFF and GIF compression (MSB/LSB bitstream)
- Brotli (RFC 7932) - LZ77 + context-dependent Huffman, static dictionary, quality 0-11
- Snappy - Ultra-fast LZ77 variant with block and framed formats
- Store - No compression
- Pure Rust - No C/Fortran dependencies, 100% safe Rust
- Optimized CRC - Slicing-by-8 implementation (3-5x faster than table lookup)
- Modern CLI - Progress bars, verbose output, JSON support, shell completions
- Streaming API - Memory-efficient processing with stdin/stdout support
- Async I/O - Async ZIP and async deflate support (async-io feature flag)
- Streaming API - GzipStream/ZlibStream/LzwStream encoders/decoders with flush modes
- Dry-Run Mode - Preview operations without writing files
- EntryBuilder - Fluent API for building archive entries
- Pattern Filtering - Include/exclude patterns with glob syntax
- Metadata Preservation - Timestamps, permissions, extended attributes
- Auto-detection - Automatic format detection from magic bytes
- Flexible Overwrite - Overwrite, skip, or prompt modes
+----------------------------------------------------------+
| L4: Unified API (oxiarc-cli) |
| CLI with progress bars, verbose mode, filters |
+----------------------------------------------------------+
| L3: Container (oxiarc-archive) |
| ZIP, TAR, GZIP, LZH, XZ, 7z, CAB, LZ4, Zstd, Bzip2, Brotli, Snappy |
+----------------------------------------------------------+
| L2: Codecs |
| oxiarc-deflate: DEFLATE (RFC 1951) + async + GZip |
| oxiarc-lzma: LZMA/LZMA2 |
| oxiarc-lzhuf: LZH (lh0-lh7) |
| oxiarc-bzip2: BWT + MTF + Huffman |
| oxiarc-lz4: LZ4 block/frame |
| oxiarc-zstd: Zstandard (FSE + Huffman) |
| oxiarc-lzw: LZW (GIF/TIFF, MSB/LSB bitstream) |
| oxiarc-brotli: Brotli (RFC 7932) |
| oxiarc-snappy: Snappy (block + framed) |
+----------------------------------------------------------+
| L1: Core (oxiarc-core) |
| BitReader/Writer, RingBuffer, CRC-16/32/64 (simd-8) |
+----------------------------------------------------------+
| Crate | Description | Lines | Tests |
|---|---|---|---|
oxiarc-core |
Core primitives: BitStream, RingBuffer, CRC-16/32/64 (slicing-by-8), EntryBuilder, Serde | ~4,373 | 111 |
oxiarc-deflate |
DEFLATE (RFC 1951) + async deflate + GZip + streaming (GzipStream/ZlibStream) | ~4,522 | 120 |
oxiarc-lzhuf |
LZH compression (lh0-lh7) with LZSS + Huffman | ~3,436 | 54 |
oxiarc-bzip2 |
Bzip2 with BWT + MTF + RLE + Huffman | ~2,037 | 37 |
oxiarc-lz4 |
LZ4 block/frame + LZ4-HC with XXHash32, acceleration parameter | ~4,120 | 110 |
oxiarc-zstd |
Zstandard with FSE + Huffman + XXHash64, dictionary support | ~6,207 | 170 |
oxiarc-lzma |
LZMA/LZMA2 with range coding + hash chains | ~4,191 | 66 |
oxiarc-archive |
12 container formats (ZIP, TAR, GZIP, LZH, XZ, 7z, CAB, LZ4, Zstd, Bzip2, Brotli, Snappy) + async ZIP | ~8,389 | 165 |
oxiarc-lzw |
LZW compression (GIF/TIFF) with MSB/LSB bitstream, streaming encoder/decoder | ~2,094 | 76 |
oxiarc-brotli |
Brotli compression (RFC 7932) with static dictionary, quality 0-11, streaming | ~3,536 | 78 |
oxiarc-snappy |
Snappy compression (block + framed format) with CRC32C | ~1,451 | 54 |
oxiarc-cli |
CLI tool with progress bars, filters, JSON output, dry-run mode | ~2,947 | - |
| Total | Pure Rust archive/compression library | ~47,303 | 1041 |
cargo install oxiarc-cligit clone https://github.com/cool-japan/oxiarc
cd oxiarc
cargo build --release
cargo install --path oxiarc-cli[dependencies]
oxiarc-archive = "0.2.6" # For archive format support
oxiarc-deflate = "0.2.6" # For DEFLATE compression
oxiarc-lzma = "0.2.6" # For LZMA/LZMA2 compression
oxiarc-bzip2 = "0.2.6" # For Bzip2 compression
oxiarc-lz4 = "0.2.6" # For LZ4 compression
oxiarc-zstd = "0.2.6" # For Zstandard compression
oxiarc-brotli = "0.2.6" # For Brotli compression
oxiarc-snappy = "0.2.6" # For Snappy compression# List archive contents
oxiarc list archive.zip
oxiarc list archive.7z --verbose
# Extract archives
oxiarc extract archive.zip
oxiarc extract data.tar.gz -o output/
oxiarc extract files.7z --progress
# Create archives
oxiarc create backup.zip file1.txt file2.txt folder/
oxiarc create data.tar dir1/ dir2/
oxiarc create compressed.xz large_file.bin
# Test integrity
oxiarc test archive.zip
oxiarc test data.lzh --verbose
# Show detailed information
oxiarc info archive.7z
oxiarc info data.cab
# Detect format
oxiarc detect unknown_file.bin
# Convert between formats
oxiarc convert old.lzh new.zip
oxiarc convert data.7z backup.taruse oxiarc_deflate::{deflate, inflate};
use oxiarc_archive::ZipReader;
use std::fs::File;
// Compress data with DEFLATE
let compressed = deflate(b"Hello, World!", 6)?;
let decompressed = inflate(&compressed)?;
// Read a ZIP archive
let file = File::open("archive.zip")?;
let mut zip = ZipReader::new(file)?;
for entry in zip.entries() {
println!("{}: {} bytes", entry.name, entry.size);
}The standard compression used in ZIP, GZIP, and PNG:
- LZ77 dictionary compression with 32KB sliding window
- Canonical Huffman coding
- Supports stored, fixed, and dynamic blocks
- Compression levels 0-9
Japanese archive format compression:
- LZSS with configurable window sizes (4KB-64KB)
- Static Huffman coding with dual trees (codes + offsets)
- Methods: lh0 (stored), lh4, lh5, lh6, lh7
Advanced compression used in 7z and XZ:
- LZ77-style dictionary compression
- Range coding for entropy encoding
- Context-dependent probability models
- 11-bit probability model (2048 states)
Block-sorting compression:
- Burrows-Wheeler Transform (BWT)
- Move-To-Front (MTF) coding
- Run-Length Encoding (RLE)
- Huffman coding
Ultra-fast compression:
- Simple LZ77 variant
- Block and frame formats
- Minimal CPU overhead
Modern fast compression:
- Finite State Entropy (FSE)
- Huffman coding
- XXHash64 checksums
- Dictionary support
Lempel-Ziv-Welch compression:
- GIF LZW codec with configurable initial code size
- LSB-first bitstream packing (GIF standard)
- MSB-first bitstream packing (TIFF standard)
- Variable bit widths (2-12 bits) with clear/EOI codes
Modern compression format:
- LZ77 with context-dependent Huffman coding
- Static dictionary with 120+ common words/phrases
- Quality levels 0-11 (fast to best compression)
- Streaming compression/decompression API
Google's fast compression:
- Simple LZ77 variant optimized for speed
- Block format for in-memory compression
- Framed format with CRC32C checksums
- Streaming Write/Read API
Async I/O support in oxiarc-deflate (enabled via async-io feature):
async_deflatemodule: non-blocking DEFLATE compress/decompressgzipmodule: streaming GZip encode/decode with RFC 1952 compliance- Compatible with tokio and async-std runtimes
Streaming compression/decompression support in oxiarc-deflate:
GzipStreamEncoder/GzipStreamDecoderwith configurable block sizesZlibStreamEncoder/ZlibStreamDecoderwith flush modes- Flush modes:
sync_flush,full_flush,partial_flush
| Format | Read | Write | Compression | Checksums | Notes |
|---|---|---|---|---|---|
| ZIP | ✅ | ✅ | DEFLATE, Store | CRC-32 | Zip64 support, data descriptors, async ZIP (async-io feature) |
| TAR | ✅ | ✅ | N/A (container only) | None | UStar, PAX, GNU long names |
| GZIP | ✅ | ✅ | DEFLATE | CRC-32 | RFC 1952 compliant |
| LZH | ✅ | ✅ | lh0-lh7 | CRC-16 | Shift_JIS support, all header levels |
| XZ | ✅ | ✅ | LZMA2 | CRC-64 | Block checksums |
| 7z | ✅ | ❌ | LZMA/LZMA2 | CRC-32 | Read-only, partial support |
| CAB | ✅ | ❌ | None, MSZIP | CRC-32 | Microsoft Cabinet, read-only |
| LZ4 | ✅ | ✅ | LZ4, LZ4-HC | XXHash32 | Frame format, block/content checksums |
| Zstd | ✅ | ✅ | Zstandard | XXHash64 | Frame format with FSE+Huffman |
| Bzip2 | ✅ | ✅ | BWT + Huffman | CRC-32 | Block-sorting compression |
| Brotli | ✅ | ✅ | Brotli (RFC 7932) | None | Quality levels 0-11, static dictionary |
| Snappy | ✅ | ✅ | Snappy | CRC32C | Block and framed formats |
Real-world performance measured on various data types:
| Level | Uniform Data | Text Data | Binary Data |
|---|---|---|---|
| Level 1 (Fast) | 400 MB/s | 85 MB/s | 48 MB/s |
| Level 5 (Normal) | 275 MB/s | 42 MB/s | 13 MB/s |
| Level 9 (Best) | 253 MB/s | 15 MB/s | 0.3 MB/s |
| Operation | Speed Range |
|---|---|
| Forward Transform | 2-11 MB/s |
| Inverse Transform | 60-320 MB/s |
| Algorithm | Naive | Slicing-by-8 | Speedup |
|---|---|---|---|
| CRC-32 | ~150 MB/s | ~500 MB/s | 3.3x |
| CRC-64 | ~100 MB/s | ~450 MB/s | 4.5x |
OxiArc implements several performance optimizations:
- CRC Slicing-by-8: Hardware-independent 3-5x speedup over table lookup
- Optimized Hash Chains: Improved LZ77 pattern matching with multiplication-based hashing
- Lazy Matching: Better compression ratios in DEFLATE with minimal speed impact
- BWT Key-Based Sorting: 4-byte prefix keys for faster block sorting
- Zero-Copy Streaming: Minimizes allocations and memory copies
- Early Rejection: Fast-path optimizations for match finding
# Create a ZIP archive from files and directories
oxiarc create backup.zip file1.txt file2.pdf documents/
# Create with compression level (store, fast, normal, best)
oxiarc create -l best archive.zip src/ tests/
# Verbose output
oxiarc create -v data.zip folder/# Create a TAR archive
oxiarc create backup.tar project/
# Combine with compression (tar.gz, tar.xz, tar.bz2, tar.zst)
gzip backup.tar # or use GZIP directly
oxiarc create backup.tar.gz folder/ # Auto-detects .gz extension# GZIP compression
oxiarc create data.txt.gz large_file.txt
# XZ (LZMA2) compression
oxiarc create database.sql.xz database.sql
oxiarc create -l best archive.xz bigdata.bin
# LZ4 (fast compression)
oxiarc create temp.lz4 file.bin
oxiarc create -l fast logs.lz4 access.log
# Zstandard compression
oxiarc create data.zst large_dataset.csv
# Bzip2 compression
oxiarc create text.bz2 document.txt# Create LZH archive (Japanese format)
oxiarc create archive.lzh file1.txt file2.txt folder/# Extract to current directory
oxiarc extract archive.zip
oxiarc extract data.tar.gz
oxiarc extract files.7z
# Extract to specific directory
oxiarc extract archive.zip -o extracted/
oxiarc extract backup.tar.xz -o /tmp/restore/
# Extract with progress bar
oxiarc extract large_archive.zip --progress
# Verbose output (show each file being extracted)
oxiarc extract data.lzh -v# Extract specific files
oxiarc extract archive.zip file1.txt readme.md
# Extract only files matching patterns (glob syntax)
oxiarc extract backup.zip --include "*.txt"
oxiarc extract data.tar --include "src/**/*.rs"
# Exclude files from extraction
oxiarc extract archive.zip --exclude "test/*" --exclude "*.tmp"
# Combine include and exclude
oxiarc extract backup.zip --include "docs/**" --exclude "*.draft"# Preserve modification timestamps
oxiarc extract archive.zip -t
# Preserve Unix file permissions
oxiarc extract backup.tar --preserve-permissions
# Preserve all metadata (timestamps + permissions)
oxiarc extract data.tar.gz -p# Always overwrite (default)
oxiarc extract archive.zip --overwrite
# Skip existing files without prompting
oxiarc extract backup.zip --skip-existing
# Prompt before overwriting each file
oxiarc extract data.zip --prompt# Decompress from stdin to stdout
cat data.gz | oxiarc extract - -o - > output.txt
curl https://example.com/data.xz | oxiarc extract - --format xz > data.txt
# Extract specific format from stdin
oxiarc extract - --format gzip < compressed.gz > original.txt# Compress to stdout
oxiarc create - --format gzip < input.txt > output.gz
cat large_file.bin | oxiarc create - --format xz > compressed.xz
# Pipe compression
find . -name "*.log" | tar -cf - -T - | oxiarc create - --format zst > logs.tar.zst# List files in archive
oxiarc list archive.zip
oxiarc list backup.tar.gz
oxiarc list data.7z
# Verbose listing (show size, date, permissions)
oxiarc list archive.zip -v
# JSON output (machine-readable)
oxiarc list data.lzh --json# List only matching files
oxiarc list backup.zip --include "*.txt"
oxiarc list archive.tar --include "src/**/*.rs"
# Exclude patterns
oxiarc list data.zip --exclude "test/*"# Test archive integrity
oxiarc test archive.zip
oxiarc test backup.tar.gz
oxiarc test data.lzh
# Verbose testing (show each file being tested)
oxiarc test archive.7z -v# Show archive metadata
oxiarc info archive.zip
oxiarc info data.7z
oxiarc info backup.lzh
# Example output:
# Format: ZIP
# Files: 42
# Compressed size: 1.2 MB
# Uncompressed size: 5.4 MB
# Compression ratio: 77.8%# Detect archive format
oxiarc detect unknown_file.bin
oxiarc detect downloaded_archive
# Useful for files without extensions
oxiarc detect mystery_file# Convert archive formats
oxiarc convert old.lzh new.zip
oxiarc convert data.7z backup.tar
oxiarc convert legacy.cab modern.zip
# Convert with compression level
oxiarc convert source.zip dest.tar -l best
# Verbose conversion
oxiarc convert old.lzh new.zip -vPattern syntax supports glob-style wildcards:
*matches any characters except/**matches any characters including/(recursive)?matches a single character[abc]matches one character from the set
# Include only specific file types
oxiarc extract archive.zip --include "*.txt" --include "*.md"
# Recursive pattern matching
oxiarc list backup.tar --include "src/**/*.rs"
oxiarc extract data.zip --include "docs/**/*.pdf"
# Complex filtering
oxiarc extract backup.zip \
--include "src/**" \
--exclude "src/test/**" \
--exclude "**/*.tmp"use oxiarc_deflate::{deflate, inflate};
use oxiarc_core::error::Result;
fn main() -> Result<()> {
// DEFLATE compression
let data = b"Hello, World! This is a test.";
let compressed = deflate(data, 6)?; // Level 6 compression
let decompressed = inflate(&compressed)?;
assert_eq!(data, &decompressed[..]);
Ok(())
}use oxiarc_archive::ZipReader;
use std::fs::File;
use std::io::Read;
fn read_zip() -> oxiarc_core::error::Result<()> {
// Open ZIP archive
let file = File::open("archive.zip")?;
let mut zip = ZipReader::new(file)?;
// List entries
for entry in zip.entries() {
println!("{}: {} bytes (compressed: {})",
entry.name,
entry.size,
entry.compressed_size
);
}
// Extract specific file
let mut data = Vec::new();
zip.extract_by_name("readme.txt", &mut data)?;
println!("Content: {}", String::from_utf8_lossy(&data));
Ok(())
}use oxiarc_archive::zip::{ZipWriter, ZipCompressionLevel};
use std::fs::File;
fn create_zip() -> oxiarc_core::error::Result<()> {
let file = File::create("output.zip")?;
let mut zip = ZipWriter::new(file);
// Add file with compression
zip.add_file(
"hello.txt",
b"Hello, World!",
ZipCompressionLevel::Normal
)?;
// Add directory
zip.add_directory("docs/")?;
// Finalize archive
zip.finish()?;
Ok(())
}use oxiarc_lzma::{compress, decompress, LzmaLevel};
fn lzma_example() -> oxiarc_core::error::Result<()> {
let data = b"This is test data for LZMA compression";
// Compress with LZMA
let compressed = compress(data, LzmaLevel::DEFAULT)?;
// Decompress
let decompressed = decompress(&compressed)?;
assert_eq!(data, &decompressed[..]);
Ok(())
}use oxiarc_bzip2::{compress, decompress, CompressionLevel};
fn bzip2_example() -> oxiarc_core::error::Result<()> {
let data = b"Data to compress with Bzip2";
// Compress (levels 1-9)
let compressed = compress(data, CompressionLevel::Best)?;
// Decompress
let decompressed = decompress(&compressed)?;
assert_eq!(data, &decompressed[..]);
Ok(())
}use oxiarc_lz4::{compress_frame, decompress_frame};
fn lz4_example() -> oxiarc_core::error::Result<()> {
let data = b"Fast compression with LZ4";
// Compress (very fast)
let compressed = compress_frame(data)?;
// Decompress
let decompressed = decompress_frame(&compressed)?;
assert_eq!(data, &decompressed[..]);
Ok(())
}use oxiarc_archive::ArchiveFormat;
use std::fs::File;
fn detect_format() -> oxiarc_core::error::Result<()> {
let mut file = File::open("unknown.bin")?;
let (format, magic) = ArchiveFormat::detect(&mut file)?;
println!("Detected format: {}", format);
println!("Magic bytes: {:02X?}", magic);
if format.is_archive() {
println!("This is a multi-file archive");
} else if format.is_compression_only() {
println!("This is single-file compression");
}
Ok(())
}# Build all crates
cargo build --release
# Run all 1041 tests
cargo nextest run --all-features
# Build CLI only
cargo build --release -p oxiarc-cli
# Install CLI
cargo install --path oxiarc-cli- Rust 1.85+ (Edition 2024)
- No external C libraries or compression dependencies
- Optional:
indicatiffor progress bars (CLI only)
We welcome contributions to OxiArc! Please follow these guidelines:
OxiArc is part of the COOLJAPAN ecosystem and follows strict development policies:
- No C/Fortran dependencies - All code must be pure Rust
- If C/Fortran bindings are absolutely necessary, they must be feature-gated
- Default features must be 100% pure Rust
- Code must compile with zero warnings
- Run
cargo clippyand fix all warnings before submitting - Use
cargo nextest run --all-featuresto verify
- Avoid using
.unwrap(),.expect(), or panicking code in production - Use proper error handling with
Result<T, E> - Provide meaningful error messages
- Use workspace-level dependency management
- Set
*.workspace = truein crateCargo.tomlfiles - No version specifications in individual crates (except keywords/categories)
- Always use the latest stable versions from crates.io
- Keep dependencies up to date
- Keep individual source files under 2000 lines
- Use
splitrstool for refactoring large files - Check with
rslines 50to find refactoring targets
-
Fork and Clone
git clone https://github.com/YOUR_USERNAME/oxiarc cd oxiarc -
Create a Branch
git checkout -b feature/your-feature-name
-
Make Changes
- Follow Rust naming conventions (snake_case for variables/functions)
- Add tests for new functionality
- Update documentation and examples
- Run tests:
cargo nextest run --all-features - Check code:
cargo clippy --all-features
-
Test Thoroughly
# Run all tests cargo nextest run --all-features # Check for warnings cargo clippy --all-features # Check formatting cargo fmt --check # Run benchmarks (if applicable) cargo bench
-
Commit Changes
- Write clear, descriptive commit messages
- Reference issue numbers if applicable
- DO NOT commit unless explicitly ready
- NEVER use
cargo publishwithout permission
-
Submit Pull Request
- Describe your changes clearly
- Reference related issues
- Ensure CI passes
- Wait for review from maintainers
- Follow standard Rust conventions
- Use
rustfmtfor formatting:cargo fmt - Document public APIs with doc comments (
///) - Include examples in documentation where helpful
- Prefer explicit over implicit
- Think deeply about implementations (ultrathink mode)
- Write unit tests for new functionality
- Add integration tests for complex features
- Include edge case testing
- Use temporary directories for file operations:
std::env::temp_dir() - Aim for high test coverage
- Update README.md for user-facing changes
- Update TODO.md for development progress
- Add API documentation for public items
- Include usage examples
- Keep documentation accurate and up-to-date
- Use
criterionfor benchmarks - Place benchmarks in
benches/directory - Document benchmark methodology
- Include various data patterns (uniform, random, text, binary)
When reporting issues, please include:
- Rust version (
rustc --version) - OxiArc version
- Operating system and architecture
- Minimal reproduction example
- Expected vs actual behavior
- Any relevant error messages
- Describe the use case clearly
- Explain why the feature would be useful
- Provide examples of how it would be used
- Consider implementation complexity
When adding new formats or algorithms:
- Follow the existing layered architecture
- Core algorithms go in appropriate codec crates
- Format support goes in
oxiarc-archive - CLI features go in
oxiarc-cli - Share common code through
oxiarc-core
- Be respectful and constructive
- Help others in issues and discussions
- Share knowledge and expertise
- Follow the Rust Code of Conduct
OxiARC is developed and maintained by COOLJAPAN OU (Team Kitasan).
If you find OxiARC useful, please consider sponsoring the project to support continued development of the Pure Rust ecosystem.
https://github.com/sponsors/cool-japan
Your sponsorship helps us:
- Maintain and improve the COOLJAPAN ecosystem
- Keep the entire ecosystem (OxiGDAL, OxiMedia, OxiBLAS, OxiFFT, SciRS2, etc.) 100% Pure Rust
- Provide long-term support and security updates
Licensed under the Apache License, Version 2.0 (LICENSE or http://www.apache.org/licenses/LICENSE-2.0).
https://github.com/cool-japan/oxiarc
COOLJAPAN OU contact@cooljapan.tech