submission: QK Gain Init 1.2 + Sliding Window Eval (stride=64) by outsourc-e · Pull Request #259 · openai/parameter-golf

outsourc-e · 2026-03-20T19:37:10Z

QK Gain Init 1.2 + Sliding Window Eval

Two orthogonal improvements over the naive baseline:

QK_GAIN_INIT=1.2 (default: 1.5) — better attention stability during short runs

EVAL_STRIDE=64, EVAL_BATCH_SEQS=32 — each token scored with 960+ tokens of context
Added forward_logits() and eval_val_sliding() functions (~70 lines)
Free ~0.03 bpb gain with zero training changes

Looking forward to seeing H100 numbers from CI!

submission: QK Gain Init 1.2 + Sliding Window Eval (stride=64)

e808e2f