record: val_bpb=1.1622, NorMuon + int6 STE + SWA + sliding window by vmfunc · Pull Request #89 · openai/parameter-golf

vmfunc · 2026-03-19T15:16:56Z

mean val_bpb=1.1622 across 3 seeds on 8xH100 (1.1624, 1.1623, 1.1618). stacks six orthogonal improvements:

int6 STE, fake per-row int6 quantization during training w/ straight-through estimator. model learns to handle post-training quant. gap is only +0.002 bpb
fp16 embedding passthrough tied embed/logit head kept in fp16 instead of quantized, most quant-sensitive tensor, no STE protection
MLP 3x (1536 hidden) int6 compression frees enough artifact bytes to fit the wider model
NorMuon row-normalized Newton-Schulz updates (from modded-nanogpt) second-moment normalization on top of Muon
SWA over 7 checkpoints during warmdown
sliding window eval stride=64 (every scored token gets 960 tokens of context) ~0.033 bpb improvement

run	seed	steps	post-quant bpb	sliding window bpb
1	1337	11917	1.1956	1.1624
2	42	11925	1.1955	1.1623
3	2025	11917	1.1951	1.1618

artifact: 15.5MB (code 54KB + int6+zstd model 15.4MB). ~50ms/step, 600s wall clock

mean val_bpb=1.1622 across 3 seeds (1.1624, 1.1623, 1.1618). int6 fake quant w/ STE, fp16 embed passthrough, MLP 3x, NorMuon, stochastic weight averaging during warmdown, sliding window stride=64. 15.5MB artifact, 8xH100, 600s, ~12k steps.

NorMuon adds per-row second-moment tracking after Newton-Schulz orthogonalization, then normalizes and rescales to preserve total norm. Based on arXiv:2510.05491 and PR openai#89. Expected -0.005 to -0.010 BPB improvement. Drop-in replacement (same class name). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vmfunc force-pushed the submission/normuon-int6ste-swa-slidingwindow branch from f6a92be to c887ef4 Compare March 19, 2026 15:18

0hq added the record submission ready for review label Mar 19, 2026

mtybadger mentioned this pull request Mar 19, 2026

Record: Sliding Window Eval, 2048 Vocab Size, fp16 embeddings, SWA, NorMuon, FA3; mean_val_bpb:1.160 #122

Open

notapplica mentioned this pull request Mar 19, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

cocohearts added does not beat SOTA and removed record submission ready for review labels Mar 20, 2026

stukenov mentioned this pull request Mar 20, 2026

11L Int5-MLP + TTT-SGD + SmearGate + SWA (1.1455 BPB) #264

Open

5 tasks

Gusanidas mentioned this pull request Apr 1, 2026

Record: Window Attention + Mixed Seq_Len Training, bpb 1.1108, eval at 6144 (5-seed mean) #1212

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

record: val_bpb=1.1622, NorMuon + int6 STE + SWA + sliding window#89

record: val_bpb=1.1622, NorMuon + int6 STE + SWA + sliding window#89
vmfunc wants to merge 1 commit intoopenai:mainfrom
vmfunc:submission/normuon-int6ste-swa-slidingwindow

vmfunc commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vmfunc commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants