Summary
Extend .apr format to support web-scale models (10B+ parameters) while maintaining WASM-first design.
Current State
APR v1 is excellent for small-medium models but lacks:
- Tensor alignment - No 32/64-byte alignment for zero-copy mmap
- Streaming decompression - LZ4 mentioned in spec but not implemented
- Sharding - Bundle system exists but not integrated with APR reader
Proposed APR v2 Features
Tier 1: Alignment (mmap-ready)
- 32-byte tensor alignment (matches GGUF)
- Padding bytes between tensors
- Zero-copy slice access
Tier 2: Compression
- LZ4 block compression (64KB blocks)
- Per-tensor compression flag
- Streaming decompression for WASM
Tier 3: Sharding (web-scale)
- Manifest file for multi-part models
- Tensor-level sharding
- Progressive loading support
Backward Compatibility
- Magic: APR2 (vs APR1)
- APR1 reader can skip unknown sections
- Converter tool: apr1-to-apr2
Non-Goals
- Not replacing GGUF (interop via export)
- Not supporting training (inference only)
References
- Current spec: docs/specifications/model-format-spec.md
- GGUF reference: format/gguf.rs
- Bundle system: bundle/
Summary
Extend .apr format to support web-scale models (10B+ parameters) while maintaining WASM-first design.
Current State
APR v1 is excellent for small-medium models but lacks:
Proposed APR v2 Features
Tier 1: Alignment (mmap-ready)
Tier 2: Compression
Tier 3: Sharding (web-scale)
Backward Compatibility
Non-Goals
References