submission: Int6 MLP3x + QAT + SlidingWindow (val_bpb: 1.1702) by trovatochris · Pull Request #117 · openai/parameter-golf

trovatochris · 2026-03-19T19:05:09Z

Summary

Stacked int6 per-row quantization + zstd22 compression, 3x MLP expansion, QAT weight-snapping at 70% training, tuned Muon optimizer (momentum=0.99 with warmup), extended warmdown (3000 iters), and stride-64 sliding window evaluation.

Key Metrics

seed1337
final_sliding_window_eval_exact val_loss:1.97577705 val_bpb:1.17016796
Total submission size int6+zstd22: 15,306,777 bytes
Under 16MB: YES (693,223 bytes headroom)
Training steps: 9,540/20,000 (600s wallclock cap)
Average step time: 62.86ms
Model params: 21,778,504

Configuration

MLP_MULT=3 VAL_LOSS_EVERY=0 MATRIX_LR=0.02 SCALAR_LR=0.02 TIED_EMBED_LR=0.03 MUON_MOMENTUM=0.99 MUON_MOMENTUM_WARMUP_STEPS=1500 MUON_MOMENTUM_WARMUP_START=0.92 WARMDOWN_ITERS=3000 ENABLE_QAT=1 QAT_START_FRAC=0.7 USE_SLIDING_EVAL=1 EVAL_STRIDE=64

Command

torchrun --standalone --nproc_per_node=8 train_gpt.py

Requires pip install zstandard for zstd compression.

trovatochris added 2 commits March 19, 2026 11:51

Create README.md

b9909ba

Create train_gpt.py

ba47f6e

notapplica mentioned this pull request Mar 20, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

alexanderaperry-arch mentioned this pull request Mar 27, 2026

QAT x SWA Ablation: SWA sabotages QAT (-3.64 mBPB, 3-seed validated) #989

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submission: Int6 MLP3x + QAT + SlidingWindow (val_bpb: 1.1702)#117

submission: Int6 MLP3x + QAT + SlidingWindow (val_bpb: 1.1702)#117
trovatochris wants to merge 2 commits intoopenai:mainfrom
trovatochris:main

trovatochris commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

trovatochris commented Mar 19, 2026

Summary

Key Metrics

Configuration

Command

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant