forked from ggml-org/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 92
Experiment: Online calibration for alignment correction #10
Copy link
Copy link
Open
Description
Hypothesis
Accumulating channel statistics during first N prefill tokens enables CAT-style alignment correction without offline calibration.
Background
CAT diagonal alignment requires per-channel variance stats. Instead of offline calibration (needs separate pass), accumulate stats during first 64 prefill tokens and apply correction to remaining tokens. Plays well with graph-side WHT architecture since stats-gathering can be inserted as a graph node.
What to test
- Accumulate channel mean/variance during first 64 prefill tokens
- Apply CAT diagonal correction to tokens 65+
- Compare quality vs offline calibration vs no calibration
- Measure prefill overhead from stats accumulation
- Test sensitivity to calibration_tokens count (32, 64, 128)
Expected outcome
Similar quality to offline CAT calibration with zero extra passes. Small prefill overhead.
Priority
Medium — depends on CAT diagonal results.
Source
AutoRepl: TODO-012 (buun, fork_dc582a)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels