Update: 11L MLP3x + WD=0.04 + zstd-22 (val_bpb 1.1502) by aruniyer · Pull Request #86 · openai/parameter-golf

aruniyer · 2026-03-19T14:15:44Z

The README content covers everything: 5-seed results, t-stat, methodology

… 1.2129) 10-layer transformer with mixed-precision export achieving mean val_bpb=1.2129 across 5 seeds on 8xH100 SXM, improving on the naive baseline by 0.0248 nats (t=34.12, p<<0.001). Key changes: - 10 layers (vs 9 baseline) - Lower LRs: MATRIX_LR=0.02, SCALAR_LR=0.02, TIED_EMBED_LR=0.03 - FP16 tied embedding export (reduces quant gap) - Int6 quantization for middle layers 2-7 (fits under 16MB) Mean artifact size: 15.36MB (under 16MB cap). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Major upgrade from previous 10L submission (1.2129 -> 1.1652 BPB). Key changes: - 9L with MLP_MULT=3 (wider MLP, 3x expansion, 21.8M params) - QAT: STE fake-quantize simulates int6 during training - Int6 quantization on all block weights (layers 0-8) - Sliding window eval (stride=64) for ~0.033 BPB free gain - FP16 tied embedding + lower LRs (carried over) 5-seed results on 8xH100 SXM: Mean slide_bpb: 1.1652 (std=0.0017) Mean rt_bpb: 1.1985 t-statistic: 78.93 (p << 0.001) All artifacts under 16MB (mean: 15.64MB) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Major upgrade: 11 layers + decoupled weight decay + zstd-22 compression. Key changes: - 11 layers (was 9) — more depth, funded by int6+zstd compression - Weight decay 0.04 on Muon + AdamW — quantization-friendly weights - zstd-22 compression — saves 1.5MB vs zlib, critical for 11L fit - Higher Muon momentum (0.99) + warmup tuning - SWA attempted but dropped (hurts with QAT) 3-seed results on 8xH100 SXM: Mean slide_bpb: 1.1502 (std=0.0004) t-statistic: 313.20 (p << 0.001) All artifacts under 16MB (mean 15.4MB) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…mbed-int6 Update: 11L MLP3x + WD=0.04 + zstd-22 (val_bpb 1.1502)

0hq added the record submission ready for review label Mar 19, 2026

aruniyer changed the title ~~10L Mixed Precision: val_bpb=1.2129 (lower LR + fp16 embed + int6 middle)~~ Update: MLP 3x + QAT + Int6 + Sliding Window (val_bpb 1.1652) Mar 20, 2026

notapplica mentioned this pull request Mar 20, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

aruniyer changed the title ~~Update: MLP 3x + QAT + Int6 + Sliding Window (val_bpb 1.1652)~~ Update: 11L MLP3x + WD=0.04 + zstd-22 (val_bpb 1.1502) Mar 20, 2026

cocohearts merged commit b774930 into openai:main Mar 20, 2026

leonardcser pushed a commit to leonardcser/parameter-golf that referenced this pull request Mar 21, 2026

Merge pull request openai#86 from aruniyer/submission/10L-lowlr-fp16e…

f871ae8

…mbed-int6 Update: 11L MLP3x + WD=0.04 + zstd-22 (val_bpb 1.1502)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update: 11L MLP3x + WD=0.04 + zstd-22 (val_bpb 1.1502)#86

Update: 11L MLP3x + WD=0.04 + zstd-22 (val_bpb 1.1502)#86
cocohearts merged 3 commits intoopenai:mainfrom
aruniyer:submission/10L-lowlr-fp16embed-int6

aruniyer commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aruniyer commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants