Skip to content

Experiment: SQuat query-orthogonal error projection #11

@TheTom

Description

@TheTom

Hypothesis

Projecting quantization error perpendicular to query subspace reduces effective error in attention computation for head_dim=128.

Background

SQuat (arXiv:2503.24358) proposes that quantization error in the direction of the query doesn't matter — only the orthogonal component affects attention scores. After FWHT rotation, the rotated Q subspace may still be low-rank, making this projection worthwhile.

What to test

  • Implement query-orthogonal error projection in dequant path
  • PPL on head_dim=128 models
  • Interaction with pre-rotate-queries optimization (both operate on Q)
  • Decode speed impact (additional projection per token)

Expected outcome

Could close head_dim=128 PPL gap. But more complex than CAT diagonal — try CAT first.

Priority

Low — depends on CAT diagonal results, more complex implementation.

Source

AutoRepl: TODO-001 (buun, fork_dc582a), arXiv:2503.24358

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions