Non-record: Systematic Hyperparameter Search (val_bpb=1.2075) by nglain · Pull Request #141 · openai/parameter-golf

nglain · 2026-03-20T00:03:37Z

Summary

Metric	Value
Post-quant val_bpb	1.2075
Pre-quant val_bpb	1.2008
Compressed artifact	~15.2 MB
Training steps	7,390
Training time	600s (8×H100 SXM)

Approach

Methodical hyperparameter search through 33 experiments across three GPU tiers (A40 → 1×H100 → 8×H100), using fixed-seed paired comparison (SEED=1337) for reliable delta measurement (±0.001 BPB).

What works

Muon optimizer (lr=0.02, momentum=0.99, warmdown=3000): -0.005 BPB
ROPE_BASE=200000: -0.003 BPB
seq_len=4096: -0.006 BPB

What doesn't work

int6 STE + Muon: conflicts (+0.007 worse)
12 layers: too slow, fewer steps
Larger batch (786K): fewer steps outweighs quality

Key insight

Optimal hyperparameters differ dramatically across compute budgets. The optimal LR on A40/2min (0.10) is 5× the optimal on 8×H100/10min (0.02). Parameters must be re-validated at target compute scale.

Changes from baseline

Only hyperparameters: MATRIX_LR=0.02, MUON_MOMENTUM=0.99, WARMDOWN_ITERS=3000, ROPE_BASE=200000, TRAIN_SEQ_LEN=4096. No architectural changes.

Test plan

Trained on 8×H100 SXM, 600s wallclock
final_int8_zlib_roundtrip val_bpb: 1.2075
Artifact under 16,000,000 bytes
train_gpt.py compiles and runs from records folder
train.log included

Methodical search through 33 experiments across A40, 1xH100, 8xH100. Fixed-seed paired comparison (SEED=1337) for reliable delta measurement. Key findings: - Muon optimizer (lr=0.02, momentum=0.99, warmdown=3000): -0.005 BPB - ROPE_BASE=200000: -0.003 BPB - seq_len=4096: -0.006 BPB - int6 STE conflicts with Muon optimizer (+0.007 worse) - Hyperparameter transfer across compute scales is unreliable val_bpb: 1.2075 (post-quant roundtrip) Artifact: ~15.2 MB (under 16 MB cap) Trained on 8xH100 SXM, 600s wallclock, 7390 steps

notapplica mentioned this pull request Mar 20, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Systematic Hyperparameter Search (val_bpb=1.2075)#141

Non-record: Systematic Hyperparameter Search (val_bpb=1.2075)#141
nglain wants to merge 1 commit intoopenai:mainfrom
nglain:submission/systematic-search

nglain commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nglain commented Mar 20, 2026

Summary

Approach

What works

What doesn't work

Key insight

Changes from baseline

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant