Skip to content

APR-FORMAT-002: APR v2 format spec for web-scale models #119

@noahgift

Description

@noahgift

Summary

Extend .apr format to support web-scale models (10B+ parameters) while maintaining WASM-first design.

Current State

APR v1 is excellent for small-medium models but lacks:

  1. Tensor alignment - No 32/64-byte alignment for zero-copy mmap
  2. Streaming decompression - LZ4 mentioned in spec but not implemented
  3. Sharding - Bundle system exists but not integrated with APR reader

Proposed APR v2 Features

Tier 1: Alignment (mmap-ready)

  • 32-byte tensor alignment (matches GGUF)
  • Padding bytes between tensors
  • Zero-copy slice access

Tier 2: Compression

  • LZ4 block compression (64KB blocks)
  • Per-tensor compression flag
  • Streaming decompression for WASM

Tier 3: Sharding (web-scale)

  • Manifest file for multi-part models
  • Tensor-level sharding
  • Progressive loading support

Backward Compatibility

  • Magic: APR2 (vs APR1)
  • APR1 reader can skip unknown sections
  • Converter tool: apr1-to-apr2

Non-Goals

  • Not replacing GGUF (interop via export)
  • Not supporting training (inference only)

References

  • Current spec: docs/specifications/model-format-spec.md
  • GGUF reference: format/gguf.rs
  • Bundle system: bundle/

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions