forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 108
Experiment: SQuat query-orthogonal error projection #11
Copy link
Copy link
Open
Description
Hypothesis
Projecting quantization error perpendicular to query subspace reduces effective error in attention computation for head_dim=128.
Background
SQuat (arXiv:2503.24358) proposes that quantization error in the direction of the query doesn't matter — only the orthogonal component affects attention scores. After FWHT rotation, the rotated Q subspace may still be low-rank, making this projection worthwhile.
What to test
- Implement query-orthogonal error projection in dequant path
- PPL on head_dim=128 models
- Interaction with pre-rotate-queries optimization (both operate on Q)
- Decode speed impact (additional projection per token)
Expected outcome
Could close head_dim=128 PPL gap. But more complex than CAT diagonal — try CAT first.
Priority
Low — depends on CAT diagonal results, more complex implementation.
Source
AutoRepl: TODO-001 (buun, fork_dc582a), arXiv:2503.24358
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels