Summary
Add optional LZ4 compression to the .apr model format to reduce storage size and improve transfer speeds.
Motivation
.apr model files can be large (MBs to GBs)
- Float32/float16 tensors compress 2-10x with LZ4
- Faster uploads to Hugging Face Hub
- Reduced disk usage for model storage
- Synergy with trueno-zram for runtime decompression
Proposed Changes
- Add
Compression enum to AprWriter/AprReader
- Support LZ4 compression (via trueno's kernel)
- Backward compatible: uncompressed files still work
- Magic bytes indicate compression type
API Design
// Writing compressed model
let writer = AprWriter::new()
.with_compression(Compression::Lz4)
.create("model.apr")?;
// Reading auto-detects compression
let reader = AprReader::open("model.apr")?;
Integration
trueno-gpu: GPU batch compression for large models
trueno-zram: CPU SIMD compression fallback
batuta: Orchestration for model export/import
Acceptance Criteria
Summary
Add optional LZ4 compression to the
.aprmodel format to reduce storage size and improve transfer speeds.Motivation
.aprmodel files can be large (MBs to GBs)Proposed Changes
Compressionenum toAprWriter/AprReaderAPI Design
Integration
trueno-gpu: GPU batch compression for large modelstrueno-zram: CPU SIMD compression fallbackbatuta: Orchestration for model export/importAcceptance Criteria
Compressionenum:None,Lz4,ZstdAprWriter::with_compression()methodAprReaderauto-detects compression from header