Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (val_bpb=1.1748) by notapplica · Pull Request #60 · openai/parameter-golf

notapplica · 2026-03-19T06:52:24Z

Summary

Mean val_bpb: 1.1748 (3 seeds, p<0.001)

Stacks 6 orthogonal improvements over the baseline:

Sliding window evaluation (stride=64, seq_len=1024) — every token scored with 960+ context
FP16 tied embedding export — skip int8 quantization for tok_emb (errors compound in both input/output paths)
10 transformer layers (up from 9) — Muon weight decay compresses enough to fit the extra layer
Decoupled weight decay for Muon optimizer (0.02) — Muon has no built-in regularization; adding p.mul_(1 - wd * lr) improves generalization + quantization
Overtone spectral embedding init — SVD power-law spectrum shaping
Phase-transition residual mixing — sigmoid-scheduled resid_mix initialization

Seed	val_loss	val_bpb	Steps	ms/step
1337	1.9849	1.1756	10424	57.55
42	1.9827	1.1742	10710	56.06
7	1.9830	1.1744	10498	57.18
Mean	1.9835	1.1748

Artifact: ~14.7 MB (under 16 MB limit)

Train@1024 with overtone embedding init and phase-transition residual mixing, eval@2048 with NTK-aware dynamic RoPE scaling. Mean val_bpb 1.2160 across 3 seeds (p=0.0012 for 0.0194-nat improvement over baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…0.0002)

…029 BPB, p=0.0006)

…8 BPB)

… + Overtone

…ertone

0hq

Looks good to me!

FI-Mihej · 2026-03-20T02:43:06Z

@0hq , looks moltbot-ty to me. Just look to issues opened by it:

notapplica · 2026-03-20T03:03:14Z

#138 is me lolol
Not moltbot but somewhat automated (i steer) (:
I have one claude working on the challenge and one claude analyzing everything in public

openai#59: 5-min + TTT, 258 steps, TTT didn't improve undertrained model openai#60: 10-min no TTT, 515 steps, best prequant 1.4038, sliding eval incomplete Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… + late QAT scaffold Based on merged SOTA openai#60 (1.1748 BPB) with phased additions: - Phase A: Int6-range export + selective fp16 passthrough (QUANT_BITS, COMPRESSOR) - Phase B: MLP 3x (MLP_MULT=3) - Phase C: Late QAT scaffold (QAT_ENABLED=0 default, activates at 75% with LR drop) - Phase D: EMA scaffold (EMA_ENABLED=0 default, decay=0.997) All features gated behind env vars, defaults match openai#60 behavior. Includes RUNBOOK.md with exact H100 run commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@notapplica

Credit: @notapplica PR openai#60 (Muon WD), @raahilshah PR openai#162 (ortho init). Weight decay 0.04 regularizes weights for better generalization and compressibility. Orthogonal init accelerates early convergence. Grad clip 0.3 stabilizes training. val_bpb 1.2649, compressed 14.7MB (-0.5MB from weight decay). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@mattqlf

Sliding window eval (credit @mattqlf PR openai#50) with configurable stride. Muon weight decay 0.04 (credit @notapplica PR openai#60). Orthogonal init with muP scaling (credit @raahilshah PR openai#162). Gradient clipping at 0.3. int8 roundtrip val_bpb: 1.2653. Sliding window would add ~0.03 on 8xH100. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…val_bpb=1.1748) (openai#60) * Add NTK Eval + Overtone Init submission (1.2160 BPB) Train@1024 with overtone embedding init and phase-transition residual mixing, eval@2048 with NTK-aware dynamic RoPE scaling. Mean val_bpb 1.2160 across 3 seeds (p=0.0012 for 0.0194-nat improvement over baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update submission: Muon WD + NTK Eval + Overtone Init (1.2094 BPB, p=0.0002) * Update submission: 10-Layer + Muon WD + NTK Eval + Overtone Init (1.2029 BPB, p=0.0006) * Update submission: FP16 Embed + 10L + Muon WD + NTK + Overtone (1.2008 BPB) * Update submission: 1.2000 BPB — FP16 Embed + 10L + Muon WD + NTK@1408 + Overtone * Update: 1.1748 BPB — Sliding Window + FP16 Embed + 10L + Muon WD + Overtone --------- Co-authored-by: notapplica <notapplica@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@notapplica

Credit: @notapplica PR openai#60 (Muon WD), @raahilshah PR openai#162 (ortho init). Weight decay 0.04 regularizes weights for better generalization and compressibility. Orthogonal init accelerates early convergence. Grad clip 0.3 stabilizes training. val_bpb 1.2649, compressed 14.7MB (-0.5MB from weight decay). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@mattqlf

Sliding window eval (credit @mattqlf PR openai#50) with configurable stride. Muon weight decay 0.04 (credit @notapplica PR openai#60). Orthogonal init with muP scaling (credit @raahilshah PR openai#162). Gradient clipping at 0.3. int8 roundtrip val_bpb: 1.2653. Sliding window would add ~0.03 on 8xH100. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@notapplica

Credit: @notapplica PR openai#60 (Muon WD), @raahilshah PR openai#162 (ortho init). Weight decay 0.04 regularizes weights for better generalization and compressibility. Orthogonal init accelerates early convergence. Grad clip 0.3 stabilizes training. val_bpb 1.2649, compressed 14.7MB (-0.5MB from weight decay). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@mattqlf

Sliding window eval (credit @mattqlf PR openai#50) with configurable stride. Muon weight decay 0.04 (credit @notapplica PR openai#60). Orthogonal init with muP scaling (credit @raahilshah PR openai#162). Gradient clipping at 0.3. int8 roundtrip val_bpb: 1.2653. Sliding window would add ~0.03 on 8xH100. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…val_bpb=1.1748) (openai#60) * Add NTK Eval + Overtone Init submission (1.2160 BPB) Train@1024 with overtone embedding init and phase-transition residual mixing, eval@2048 with NTK-aware dynamic RoPE scaling. Mean val_bpb 1.2160 across 3 seeds (p=0.0012 for 0.0194-nat improvement over baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update submission: Muon WD + NTK Eval + Overtone Init (1.2094 BPB, p=0.0002) * Update submission: 10-Layer + Muon WD + NTK Eval + Overtone Init (1.2029 BPB, p=0.0006) * Update submission: FP16 Embed + 10L + Muon WD + NTK + Overtone (1.2008 BPB) * Update submission: 1.2000 BPB — FP16 Embed + 10L + Muon WD + NTK@1408 + Overtone * Update: 1.1748 BPB — Sliding Window + FP16 Embed + 10L + Muon WD + Overtone --------- Co-authored-by: notapplica <notapplica@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica and others added 6 commits March 18, 2026 23:51

Update submission: Muon WD + NTK Eval + Overtone Init (1.2094 BPB, p=…

b55c058

…0.0002)

Update submission: 10-Layer + Muon WD + NTK Eval + Overtone Init (1.2…

810f573

…029 BPB, p=0.0006)

Update submission: FP16 Embed + 10L + Muon WD + NTK + Overtone (1.200…

5e0d7e5

…8 BPB)

Update submission: 1.2000 BPB — FP16 Embed + 10L + Muon WD + NTK@1408…

2a90936

… + Overtone

Update: 1.1748 BPB — Sliding Window + FP16 Embed + 10L + Muon WD + Ov…

56dc745

…ertone

0hq added the record submission ready for review label Mar 19, 2026

notapplica changed the title ~~NTK Eval + Overtone Init (val_bpb=1.2160)~~ Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (val_bpb=1.1748) Mar 19, 2026

0hq approved these changes Mar 19, 2026

View reviewed changes

0hq merged commit 9fbdf8c into openai:main Mar 19, 2026

notapplica mentioned this pull request Mar 19, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

stukenov mentioned this pull request Mar 20, 2026

11L Int5-MLP + TTT-SGD + SmearGate + SWA (1.1455 BPB) #264

Open

5 tasks

MatoTeziTanka mentioned this pull request Mar 21, 2026

PROTEUS v4 — non-record submission (val_bpb: 1.2037) #368

Open

resouer pushed a commit to resouer/parameter-golf that referenced this pull request Mar 29, 2026

exp: tighten openai#60 launch gate on lineage port

0516bad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (val_bpb=1.1748)#60

Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (val_bpb=1.1748)#60
0hq merged 6 commits intoopenai:mainfrom
notapplica:submission/ntk-eval-overtone-init

notapplica commented Mar 19, 2026 •

edited

Loading

Uh oh!

0hq left a comment

Uh oh!

FI-Mihej commented Mar 20, 2026

Uh oh!

notapplica commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

notapplica commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

0hq left a comment

Choose a reason for hiding this comment

Uh oh!

FI-Mihej commented Mar 20, 2026

Uh oh!

notapplica commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

notapplica commented Mar 19, 2026 •

edited

Loading

notapplica commented Mar 20, 2026 •

edited

Loading