Skip to content

submission: QK Gain Init 1.2 + Sliding Window Eval (stride=64)#259

Open
outsourc-e wants to merge 1 commit intoopenai:mainfrom
outsourc-e:submission/qk-gain-init
Open

submission: QK Gain Init 1.2 + Sliding Window Eval (stride=64)#259
outsourc-e wants to merge 1 commit intoopenai:mainfrom
outsourc-e:submission/qk-gain-init

Conversation

@outsourc-e
Copy link
Copy Markdown

QK Gain Init 1.2 + Sliding Window Eval

Two orthogonal improvements over the naive baseline:

1. QK Gain Initialization (training)

  • QK_GAIN_INIT=1.2 (default: 1.5) — better attention stability during short runs

2. Sliding Window Evaluation (eval-only)

  • EVAL_STRIDE=64, EVAL_BATCH_SEQS=32 — each token scored with 960+ tokens of context
  • Added forward_logits() and eval_val_sliding() functions (~70 lines)
  • Free ~0.03 bpb gain with zero training changes

Local Results (RTX 4090, ~340 steps)

Config int8+zlib bpb
Baseline 1.6353
QK Gain 1.2 1.6133
Sliding Window (raw) 1.5879

Looking forward to seeing H100 numbers from CI!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant