warmdown-quantization val_bpb = 1.2154 by saml212 · Pull Request #61 · openai/parameter-golf

saml212 · 2026-03-19T06:57:07Z

Score

val_bpb = 1.2154 (baseline: 1.2244, improvement: 0.009 BPB / 0.017 nats)

Key Finding

Post-training int8 quantization is the dominant BPB bottleneck on 8xH100. The quantization penalty alone (0.014 BPB at default settings) is larger than most hyperparameter improvements combined. We reduce it 3x via an always-decaying LR schedule.

Novel Contributions

1. Always-decaying LR schedule (WARMDOWN_ITERS=20000): With ~12,200 actual steps, the LR decays linearly from 61% of peak at step 0 to near-zero at the final step. Post-quant penalty drops from 0.014 to 0.005 BPB. Full curve mapped across 10 warmdown values (2400-30000).

2. FP16 tied embeddings: Keep tok_emb.weight in fp16 during int8 export. Costs ~500KB, offset by MLP_HIDDEN=992.

3. Optimal NTK-RoPE extrapolation: eval@1408 (1.375x training length) beats eval@2048 on well-trained 8xH100 models. Full curve from 1024 to 2048.

4. Optimizer-warmdown interaction: MUON_BACKEND_STEPS=5 beats 7 at high warmdown (reversal from low warmdown). When warmdown already smooths weights, more steps > better orthogonalization.

Configuration

WARMDOWN_ITERS=20000 MATRIX_LR=0.06 TIED_EMBED_LR=0.07 SCALAR_LR=0.06
GRAD_CLIP_NORM=1.0 MUON_BACKEND_STEPS=5 EVAL_SEQ_LEN=1408
+ FP16 tied embedding + MLP_HIDDEN=992

15.91MB artifact. 8xH100 SXM (RunPod). See README.md for full warmdown sweep data.

Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate NTK-RoPE extrapolation (eval@1408). Full warmdown sweep across 10 values and detailed analysis in README.

- add a PR-audit research log entry covering the clean takeaways from pull requests openai#36 through openai#70 - promote long-context training plus matching long-context eval as a first-class clean branch based on PR openai#61 and PR openai#63 - refine mixed-precision export notes to emphasize using int6/int8 byte savings to fund wider MLP capacity, based on PR openai#65 - update the current snapshot and research thesis so future agents do not over-focus on exporter-only ideas after the broader PR sweep

- fix the PR-audit notes to attribute the long-context branch to PR openai#65 rather than PR openai#61 - record PR openai#61 as schedule-side evidence about long warmdown reducing quantization damage - keep the ideas backlog aligned with the actual GitHub PR content before using it for next-step decisions

0hq · 2026-03-19T21:16:37Z

Confused, your title says 1.1574 but your header says "val_bpb = 1.2154 (baseline: 1.2244, improvement: 0.009 BPB / 0.017 nats)"

0hq

Tentatively approving, just to keep leaderboard up to date for others.

0hq · 2026-03-19T21:35:23Z

Before I officially add to leaderboard, mind running again to verify that the 1.1574 result is within noise? 1-2 more runs that show the same result would be great.

0hq · 2026-03-19T21:36:28Z

Ah sorry I see that this is no longer the latest PR, moving there.

saml212 · 2026-03-20T02:28:41Z

Hey Will sorry about the confusion on the documentation here. I saw people updating their PR's and I thought that would be a good idea so I went back to update this then changed my mind and reverted it but didn't update the title and comment since I thought nobody would be looking this far back. Missed the title updated now.

* Warmdown-quantization co-optimization, val_bpb=1.2154 Novel finding: aggressive LR decay (WARMDOWN_ITERS=20000) reduces int8 quantization penalty from 0.014 to 0.005 BPB. Combined with FP16 tied embeddings and moderate NTK-RoPE extrapolation (eval@1408). Full warmdown sweep across 10 values and detailed analysis in README. * breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256 --------- Co-authored-by: Sam Larson <saml212@users.noreply.github.com>

saml212 changed the title ~~Warmdown-Quantization: val_bpb=1.2154~~ Long-context sliding window: val_bpb=1.1793 Mar 19, 2026

saml212 changed the title ~~Long-context sliding window: val_bpb=1.1793~~ Long-context sliding window: val_bpb=1.1780 Mar 19, 2026

saml212 changed the title ~~Long-context sliding window: val_bpb=1.1780~~ Long-context sliding window: val_bpb=1.1769 Mar 19, 2026

saml212 changed the title ~~Long-context sliding window: val_bpb=1.1769~~ Long-context sliding window: val_bpb=1.1764 Mar 19, 2026

jordankzf mentioned this pull request Mar 19, 2026

Unofficial Leaderboard #83

Closed

saml212 force-pushed the sam/warmdown-quantization branch from 50225c5 to 4a65f69 Compare March 19, 2026 15:51

saml212 mentioned this pull request Mar 19, 2026

Sliding Window + Long-Context Training: val_bpb=1.1764 #96

Open

saml212 changed the title ~~Long-context sliding window: val_bpb=1.1764~~ Warmdown-Quantization: val_bpb=1.2154 (novel LR-quantization co-optimization) Mar 19, 2026

0hq added the record submission ready for review label Mar 19, 2026

breakthrough: 1.1574 BPB via int6 + MLP 3x + sliding window stride=256

6109533

saml212 changed the title ~~Warmdown-Quantization: val_bpb=1.2154 (novel LR-quantization co-optimization)~~ Int6 + MLP 3x + sliding window: val_bpb=1.1574 Mar 19, 2026

saml212 mentioned this pull request Mar 19, 2026

Record: val_bpb=1.1574 — Int6 + MLP 3x + selective precision + optimized long-context training #114

Open

mtybadger mentioned this pull request Mar 19, 2026

Record: Sliding Window Eval, 2048 Vocab Size, fp16 embeddings, SWA, NorMuon, FA3; mean_val_bpb:1.160 #122

Open

0hq approved these changes Mar 19, 2026

View reviewed changes

0hq merged commit 555669e into openai:main Mar 19, 2026

notapplica mentioned this pull request Mar 19, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

saml212 changed the title ~~Int6 + MLP 3x + sliding window: val_bpb=1.1574~~ warmdown-quantization val_bpb = 1.2154 Mar 20, 2026

This was referenced Mar 20, 2026

Record: 11L Int6 + SmearGate + Batch Optimization (val_bpb=1.1400) #236

Open

Record: 12L Gradient-Guided Quant + Partial RoPE + LN Scale + EMA + XSA4 (val_bpb: 1.1320) #332

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warmdown-quantization val_bpb = 1.2154 #61

warmdown-quantization val_bpb = 1.2154 #61
0hq merged 2 commits intoopenai:mainfrom
saml212:sam/warmdown-quantization

saml212 commented Mar 19, 2026 •

edited

Loading

Uh oh!

0hq commented Mar 19, 2026

Uh oh!

0hq left a comment

Uh oh!

0hq commented Mar 19, 2026

Uh oh!

0hq commented Mar 19, 2026

Uh oh!

saml212 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saml212 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Score

Key Finding

Novel Contributions

Configuration

Uh oh!

0hq commented Mar 19, 2026

Uh oh!

0hq left a comment

Choose a reason for hiding this comment

Uh oh!

0hq commented Mar 19, 2026

Uh oh!

0hq commented Mar 19, 2026

Uh oh!

saml212 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saml212 commented Mar 19, 2026 •

edited

Loading

saml212 commented Mar 20, 2026 •

edited

Loading